TEI, the Text Encoding Initiative, is a set of guidelines defining an encoding scheme such as XML. The TEI language defines the tag set of XML elements used in text encoding. It also includes attributes used in modifying these elements. While the full TEI tag set is very rich, including 500 elements, most TEI projects do not need to incorporate all of these to be useful. TEI is modular—it can be tailored to particular projects. TEI is extensible—it can be added to and amended. A software tool called Roma can help simplify the task of creating a TEI customization. TEI is designed to be flexible, allowing variety in local encoding practice. There is no one correct way to encode a text. Guidelines of TEI do, however, offer best practices and recommended usage.
To understand TEI, one must take a step back and understand XML, the mark-up language TEI defines. Mark-up, or encoding, is any means of making explicit an interpretation of text for human readability, document structure, or computer interpretation. The set of conventions for mark-up in encoding text must specify how mark-up is distinguished from text, and specify what mark-up is allowed and required. That’s what XML does. The TEI guidelines specify what that mark-up means.
How is TEI Useful?
In the immediate sense this is useful in making digitized texts discoverable, accessible, sustainable, and therefore more useful. XML with TEI guidelines is metadata, and as discussed in previous blog post, metadata is useful in bridging the gap between the information and its seeker in the digital environment.
In a larger sense, though, this tagging of text can facilitate scholarly study of works by means of distant reading—using computer programs to search vast amounts of text in order to find patterns in language that can lead to answering questions of authorship or more fully elucidating understanding of textual works.