1 The TEI Infrastructure
Inhalt
This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented. It assumes some familiarity with XML and XML schemas (see chapter v. A Gentle Introduction to XML) but is intended to be accessible to any user of these Guidelines. Other chapters supply further technical details, in particular chapter 22 Documentation Elements which describes the XML schema used to express the Guidelines themselves, and chapter 23 Using the TEI which combines a discussion of modification and conformance issues with a description of the intended behaviour of an ODD processor; these chapters should be read by anyone intending to implement a new TEI-based system.
The TEI encoding scheme consists of a number of modules, each of which declares particular XML elements and their attributes. Part of an element's declaration includes its assignment to one or more element classes. Another part defines its possible content and attributes with reference to these classes. This indirection gives the TEI system much of its strength and its flexibility. Elements may be combined more or less freely to form a schema appropriate to a particular set of requirements. It is also easy to add new elements which reference existing classes or elements to a schema, as it is to exclude some of the elements provided by any module included in a schema.
In principle, a TEI schema may be constructed using any combination of modules. However, certain TEI modules are of particular importance, and should always be included in all but exceptional circumstances: the module tei described in the present chapter is of this kind because it defines classes, macros, and datatypes which are used by all other modules. The core module, defined in chapter 3 Elements Available in All TEI Documents contains declarations for elements and attributes which are likely to be needed in almost any kind of document, and is therefore recommended for global use. The header module defined in chapter 2 The TEI Header provides declarations for the metadata elements and attributes constituting the TEI Header, a component which is required for TEI conformance, while the textstructure module defined in chapter 4 Default Text Structure declares basic structural elements needed for the encoding of most book-like objects. Most schemas will therefore need to include these four modules.
The specification for a TEI schema is itself a TEI document, using elements from the module described in chapter 22 Documentation Elements: we refer to such a document informally as an ODD document, from the design goal originally formulated for the system: ‘One Document Does it all’. Stylesheets for maintaining and processing ODD documents are maintained by the TEI, and these Guidelines are also maintained as such a document. As further discussed in 23.4 Implementation of an ODD System, an ODD document can be processed to generate a schema expressed using any of the three schema languages currently in wide use: the XML DTD language, the ISO RELAX NG language, or the W3C Schema language, as well as to generate documentation such as the Guidelines and their associated web site.
The bulk of this chapter describes the TEI infrastructure module itself. Although it may be skipped at a first reading, an understanding of the topics addressed here is essential for anyone planning to take full advantage of the TEI customization techniques described in chapter 23.2 Personalization and Customization.
The chapter begins by briefly characterizing each of the modules available in the TEI scheme. Section 1.2 Defining a TEI Schema describes in general terms the method of constructing a TEI schema in a specific schema language such as XML DTD language, RELAX NG, or W3C Schema.
The next and largest part of the chapter introduces the attribute and element classes used to define groups of elements and their characteristics (section 1.3 The TEI Class System).
Finally, section 1.4 Macros introduces the concept of macros, which are used to express some commonly used content models, and lists the datatypes used to constrain the range of legal values for TEI attributes (section 1.4.2 Datatype Macros).
1.1 TEI ModulesTEI: TEI Modules¶
- a prose description
- a formal declaration, expressed using a special-purpose XML vocabulary defined by these Guidelines in combination with elements taken from the ISO schema language RELAX NG
- usage examples
Each chapter of the Guidelines presents a group of related elements, and also defines a corresponding set of declarations, which we call a module. All the definitions are collected together in the reference sections provided as an appendix. Formal declarations for a given chapter are collected together within the corresponding module. For convenience, each element is assigned to a single module, typically for use in some specific application area, or to support a particular kind of usage. A module is thus simply a convenient way of grouping together a number of associated element declarations. In the simple case, a TEI schema is made by combining together a small number of modules, as further described in section 1.2 Defining a TEI Schema below.
Module name | Formal public identifier | Where defined |
analysis | Analysis and Interpretation | 17 Simple Analytic Mechanisms |
certainty | Certainty and Uncertainty | 21 Certainty, Precision, and Responsibility |
core | Common Core | 3 Elements Available in All TEI Documents |
corpus | Metadata for Language Corpora | 15 Language Corpora |
dictionaries | Print Dictionaries | 9 Dictionaries |
drama | Performance Texts | 7 Performance Texts |
figures | Tables, Formulae, Figures | 14 Tables, Formulæ, Graphics and Notated Music |
gaiji | Character and Glyph Documentation | 5 Representation of Non-standard Characters and Glyphs |
header | Common Metadata | 2 The TEI Header |
iso-fs | Feature Structures | 18 Feature Structures |
linking | Linking, Segmentation, and Alignment | 16 Linking, Segmentation, and Alignment |
msdescription | Manuscript Description | 10 Manuscript Description |
namesdates | Names, Dates, People, and Places | 13 Names, Dates, People, and Places |
nets | Graphs, Networks, and Trees | 19 Graphs, Networks, and Trees |
spoken | Transcribed Speech | 8 Transcriptions of Speech |
tagdocs | Documentation Elements | 22 Documentation Elements |
tei | TEI Infrastructure | 1 The TEI Infrastructure |
textcrit | Text Criticism | 12 Critical Apparatus |
textstructure | Default Text Structure | 4 Default Text Structure |
transcr | Transcription of Primary Sources | 11 Representation of Primary Sources |
verse | Verse | 6 Verse |
For each module listed above, the corresponding chapter gives a full description of the classes, elements, and macros which it makes available when it is included in a schema. Other chapters of these Guidelines explore other aspects of using the TEI scheme.
1.2 Defining a TEI SchemaTEI: Defining a TEI Schema¶
To determine that an XML document is valid (as opposed to merely well-formed), its structure must be checked against a schema, as discussed in chapter v. A Gentle Introduction to XML. For a valid TEI document, this schema must be a conformant TEI schema, as further defined in chapter 23.3 Conformance. Local systems may allow their schema to be implicit, but for interchange purposes the schema associated with a document must be made explicit. The method of doing this recommended by these Guidelines is to provide explicitly or by reference a TEI schema specification against which the document may be validated.
A TEI-conformant schema is a specific combination of TEI modules, possibly also including additional declarations that modify the element and attribute declarations contained by each module, for example to suppress or rename some elements. The TEI provides an application-independent way of specifying a TEI schema by means of the schemaSpec element defined in chapter 22 Documentation Elements. The same system may also be used to specify a schema which extends the TEI by adding new elements explicitly, or by reference to other XML vocabularies. In either case, the specification may be processed to generate a formal schema, expressed in a variety of specific schema languages, such as XML DTD language, RELAX NG, or W3C Schema. These output schemas can then be used by an XML processor such as a validator or editor to validate or otherwise process documents. Further information about the processing of a TEI formal specification is given in chapter 23 Using the TEI.
1.2.1 A Simple CustomizationTEI: A Simple Customization¶
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
</schemaSpec>
This schema specification contains references to each of four modules, identified by the key attribute on the moduleRef element. The schema specification itself is also given an identifier (TEI-minimal). An ODD processor will generate an appropriate schema from this set of declarations, expressed using the XML DTD language, the ISO RELAX NG language, the W3C Schema language, or in principle any other adequately powerful schema language. The resulting schema may then be associated with the document instance by one of a number of different mechanisms, as further described in chapter v. A Gentle Introduction to XML. The start point (or root element) of document instances to be validated against the schema is specified by means of the start attribute. Further information about the processing of an ODD specification is given in 23.4 Implementation of an ODD System.
1.2.2 A Larger CustomizationTEI: A Larger Customization¶
- a text may be a collection of other texts of different types: for example, an anthology of prose, verse, and drama;
- a text may contain other smaller, embedded texts: for example, a poem or song included in a prose narrative;
- some sections of a text may be written in one form, and others in a different form: for example, a novel where some chapters are in prose, others take the form of dictionary entries, and still others the form of scenes in a play;
- an encoded text may include detailed analytic annotation, for example of rhetorical or linguistic features;
- an encoded text may combine a literal transcription with a diplomatic edition of the same or different sources;
- the description of a text may require additional specialized metadata elements, for example when describing manuscript material in detail.
- a definition of a corpus or collection as a series of TEI documents, sharing a common TEI header (see chapter 15 Language Corpora)
- a definition of composite texts which combine optional front- and back-matter with a group of collected texts, themselves possibly composite (see section 4.3.1 Grouped Texts)
- an element for the representation of embedded texts, where one narrative appears to ‘float’ within another (see section 4.3.2 Floating Texts)
Subsequent chapters of these Guidelines describe in detail markup constructs appropriate for these and many other possible features of interest. The markup constructs can be combined as needed for any given set of applications or project.
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="msdescription"/>
<!-- manuscript description -->
<moduleRef key="transcr"/>
<!-- transcription of primary sources -->
<moduleRef key="figures"/>
<!-- figures and tables -->
<moduleRef key="namesdates"/>
<!-- names, dates, people, and places -->
</schemaSpec>
<moduleRef key="tei"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="transcr"/>
</schemaSpec>
The TEI architecture also supports more detailed customization beyond the simple selection of modules. A schema may suppress elements from a module, suppress some of their attributes, change their names, or even add new elements and attributes. Detailed discussion of the kind of modification possible in this way is provided in 23.2 Personalization and Customization and conformance rules relating to their application are discussed in 23.3 Conformance. These facilities are available for any schema language (though some features may not be available in all languages). The ODD language also makes it possible to combine TEI and non-TEI modules into a single schema, provided that the non-TEI module is expressed using the RELAX NG schema language (see further 22.6 Combining TEI and Non-TEI Modules).
1.3 The TEI Class SystemTEI: The TEI Class System¶
The TEI scheme distinguishes about five hundred different elements. To aid comprehension, modularity, and modification, the majority of these elements are formally classified in some way. Classes are used to express two distinct kinds of commonality among elements. The elements of a class may share some set of attributes, or they may appear in the same locations in a content model. A class is known as an attribute class if its members share attributes, and as a model class if its members appear in the same locations. In either case, an element is said to inherit properties from any classes of which it is a member.
Classes (and therefore elements which are members of those classes) may also inherit properties from other classes. For example, supposing that class A is a member (or a subclass) of class B, any element which is a member of class A will inherit not only the properties defined by class A, but also those defined by class B. In such a situation, we also say that class B is a superclass of class A. The properties of a superclass are inherited by all members of its subclasses.
A basic understanding of the classes into which the TEI scheme is organized is strongly recommended and is essential for any successful customization of the system.
1.3.1 Attribute ClassesTEI: Attribute Classes¶
An attribute class groups together elements which share some set of
common attributes. Attribute classes are given names composed of the prefix
att.
, often followed by an adjective. For example, the members
of the class att.canonical have in common a
key and a ref attribute, both of which are
inherited from their membership in the class rather than individually
defined for each element. These attributes are said to be defined by
(or inherited from) the att.canonical
class. If another element were to be added to the TEI scheme for which
these attributes were considered useful, the simplest way to provide
them would be to make the new element a member of the att.canonical class. Note also that this method
ensures that the attributes in question are always defined in the same
way, taking the same default values etc., no matter which element they
are attached to.
Some attribute classes are defined within the tei infrastructural module and are thus globally available. Other attribute classes are specific to particular modules and thus defined in other chapters. Attributes defined by such classes will not be available unless the module concerned is included in a schema.
The attributes provided by an attribute class are those specified by the class itself, either directly, or by inheritance from another class. For example, the attribute class att.pointing.group provides attributes domains and targFunc to all of its members. This class is however a subclass of the att.pointing class, from which its members also inherit the attributes target, targetLang and evaluate. Members of the class att.pointing will thus have these three attributes, while members of the class att.pointing.group will have all five.
Note that some modules define superclasses of an existing infrastructural class. For example, the global attribute class att.divLike makes attributes org, part, and sample available, while the att.metrical class, which is specific to the verse module, provides attributes met, real, and rhyme. Because att.metrical is defined as a superclass of att.divLike, all six of these attributes are available to elements; the declaration for att.metrical adds its three attributes to the three already defined by att.divLike when the verse module is included in a schema. If, however, this module is not included in a schema, then the att.divLike class supplies only the three attributes first mentioned.
Attributes specific to particular modules are documented along with the relevant module rather than in the present chapter. One particular attribute class, known as att.global, is common to all modules, and is therefore described in some detail in the next section. A full list of all attribute classes is given in Attribute Classes below.
1.3.1.1 Global AttributesTEI: Global Attributes¶
- att.global provides attributes common to all elements in the TEI encoding scheme.
xml:id (identifier) provides a unique identifier for the element bearing the attribute. n (number) gives a number (or other label) for an element, which is not necessarily unique within the document. xml:lang (language) indicates the language of the element content using a ‘tag’ generated according to BCP 47 rend (rendition) indicates how the element in question was rendered or presented in the source text. rendition points to a description of the rendering or presentation used for this element in the source text. xml:base provides a base URI reference with which applications can resolve relative URI references into absolute URI references. xml:space signals an intention about how white space should be managed by applications.
These attributes are optionally available for any TEI element; none of them is required. Their usage is discussed in the following subsections.
1.3.1.1.1 Element Identifiers and LabelsTEI: Element Identifiers and Labels¶
The value supplied for the xml:id attribute must be a legal name, as defined in the World Wide Web Consortium's XML Recommendation. This means that it must begin with a letter, or the underscore character (‘_’), and contain no characters other than letters, digits, hyphens, underscores, full stops, and certain combining and extension characters.1
In XML names (and thus the values of xml:id in an XML TEI document) uppercase and lowercase letters are distinguished, and thus partTime and parttime are two distinctly different names, and could (though perhaps unwisely) be used to denote two different element occurrences.
xml:id="PAGE1"><q>What's it going to be then, eh?</q></p>
<p xml:id="PAGE1">There was me, that is Alex, and my three droogs,
that is Pete, Georgie, and Dim, ... </p>
For a discussion of methods of providing unique identifiers for elements, see section 3.10.2 Creating New Reference Systems.
<item n="1">About These Guidelines</item>
<item n="2">A Gentle Introduction to SGML</item>
<item n="9">Verse</item>
<item n="10">Drama</item>
<item n="10">Spoken Materials </item>
<item n="12">Print Dictionaries</item>
</list>
<!-- ... -->
<div type="stanza" n="xlii">
<!-- ... -->
</div>
</div>
<!-- ... -->
</l>
<l n="2">
<!-- ... -->
</l>
<l n="3">
<!-- ... -->
</l>
<!-- ... -->
<l n="100">
<!-- ... -->
</l>
1.3.1.1.2 Language IndicatorsTEI: Language Indicators¶
The xml:lang attribute indicates the natural language and writing system applicable to the content of a given element. If it is not specified, the value is inherited from that of the immediately enclosing element. As a rule, therefore, it is simplest to specify the base language of the text on the TEI element, and allow most elements to take the default value for xml:lang; the language of an element then need be explicitly specified only for elements in languages other than the base language. For this reason, it is recommended practice to supply a default value for the xml:lang attribute, either on the TEI root element, or on both the teiHeader and the text element. The latter is appropriate in the not uncommon case where the text element in a TEI document uses a different default language from that of the TEI Header attached to it. Other language shifts in the source should be explicitly identified by use of the xml:lang attribute on an element at an appropriate level wherever possible.
<teiHeader>
<!-- ... -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="en">
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="fr">
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="fr">
<body>
<div>
<!-- chapter one is in French -->
</div>
<div xml:lang="de">
<!-- chapter two is in German -->
</div>
<div>
<!-- chapter three is French -->
</div>
<!-- ... -->
</body>
</text>
</TEI>
or <term xml:lang="la">ex post facto</term> law shall be passed.</q>
...</p>
The values used for the xml:lang and targetLangattributes must be constructed in a particular way, using values from standard lists. See further vi.1. Language identification.
Additional information about a particular language may be supplied in the language element within the header (see section 2.4.2 Language Usage).
1.3.1.1.3 Rendition IndicatorsTEI: Rendition Indicators¶
pure and pious; but he was equally alarmed by his knowledge
of the ambitious <name rend="italics">Bohemond</name>, and
his ignorance of the Transalpine chiefs: ...</p>
Although the contents of the rend attribute are free text, in any given project, encoders are advised to adopt a standard vocabulary with which to describe typographic or manuscript rendition of the text.
<!-- define italic style using CSS --><rendition xml:id="IT" scheme="css">font-style: italic</rendition>
<!-- set italic style as default for the emph and hi elements -->
<tagUsage gi="emph" render="#IT"/>
<tagUsage gi="hi" render="#IT"/>
<!-- indicate that a specific p element is also in italic style -->
<p rendition="#IT"/>
The rendition attribute always points to one or more rendition elements, each of which defines some aspect of the rendering or appearance of the text in its original form. These details may be described using a formal language, such as CSS (Lie and Bos (eds.) (1999)) or XSL-FO (Berglund (ed.) (2006)); in some other formal language developed for a specific project; or informally in running prose. Although languages such as CSS and XSL-FO are generally used to describe document output to screen or print, they nonetheless provide formal and precise mechanisms for describing the appearance of many source documents, especially print documents, but also many aspects of manuscript documents. For example, both CSS and XSL-FO provide mechanisms for describing typefaces, weight, and styles; character and line spacing; and so on.
If both rendition and rend attributes are provided for a given element, the latter always takes precedence. The rendition attribute is analogous to the HTML or XHTML class attribute, which references style declarations in a Cascading Style Sheet. The rend attribute is analogous to the (X)HTML style attribute, which provides a mechanism for embedding inline rendition information at the point of use within a document. Note that, in either case, the TEI attributes describe the rendition or appearance of the source document, not intended output renditions, although often the two may be closely related.
1.3.1.1.4 Other global attributesTEI: Other global attributes¶
The global attributes xml:base and xml:space are also provided by default in any TEI schema. Like xml:id these attributes are defined as part of the XML specification and belong to the XML namespace rather than the TEI namespace. We do not describe them in detail here: reference information for xml:base is provided by Marsh (ed.) (2001); for xml:space by the section 2.10 of the XML Specification.
<div
xml:base="http://www.example.org/somewhere.xml">
<p>
<!--... -->
<ptr target="#p1"/>
<!--... -->
</p>
</div>
<div>
<p>
<!--... -->
<ptr target="#p1"/>
<!--... -->
</p>
</div>
</body>
http://www.example.org/somewhere.xml#p1
. The second
ptr, however, is within the scope of a div which
does not change the default context, and its target is therefore some
element within the current document with the value p1
for its
xml:id attribute. Further discussion of this element and
its effect on TEI linking methods is provided in chapter 16 Linking, Segmentation, and Alignment.
<item>apple pie</item>
<item>banana custard</item>
<item>carrot cake</item>
</list>
Some XML processors, notably XML editors, may introduce whitespace in a document to enhance its readability when it is displayed. Such whitespace should normally be added only at locations where it is not significant, but not all processors can detect this reliably.
Most TEI elements permit mixed content, and consequently the presence or absence of whitespace is generally significant in a TEI document. There are many TEI structural elements (such as div or p) for which the availability of non-significant whitespace may also be convenient. Consequently it is rarely necessary to modify the default whitespace behaviour, which is the function of the xml:space attribute. There are however a few situations in which it may be essential, typically where complex markup is being worked on by a tool which introduces whitespace in order to enhance display of the text.
For example, when transcribing an inscription with the elements described in chapter 11 Representation of Primary Sources, a single word may well gain several additional tags to mark parts of the word which are supplied or conjectural. Such tags do not interrupt the word however, and hence introducing space where they occur would be misleading. The value of preserve for the xml:space attribute on the parent div element may be used to indicate that all and only the spaces actually present in the XML source should be regarded as significant; an XML editor or other processor is not then permitted to introduce additional spaces.
1.3.2 Model ClassesTEI: Model Classes¶
As noted above, the members of a given TEI model class share the property that they can all appear in the same location within a document. Wherever possible, the content model of a TEI element is expressed not directly in terms of specific elements, but indirectly in terms of particular model classes. This makes content models simpler and more consistent; it also makes them much easier to understand and to modify.
Like attribute classes, model classes may have subclasses or superclasses. Just as elements inherit from a class the ability to appear in certain locations of a document (wherever the class can appear), so all members of a subclass inherit the ability to appear wherever any superclass can appear. To some extent, the class system thus provides a way of reducing the whole TEI galaxy of elements into a tidy hierarchy. This is however not entirely the case.
In fact, the nature of a given class of elements can be considered along two dimensions: as noted, it defines a set of places where the class members are permitted within the document hierarchy; it also implies a semantic grouping of some kind. For example, the very large class of elements which can appear within a paragraph comprises a number of other classes, all of which have the same structural property, but which differ in their field of application. Some are related to highlighting, while others relate to names or places, and so on. In some cases, the ‘set of places where class members are permitted’ is very constrained: it may just be within one specific element, or one class of element, for example. In other cases, elements may be permitted to appear in very many places, or in more than one such set of places.
These factors are reflected in the way that model classes are named. If a model class has a name containing part, such as model.divPart or model.biblPart then it is primarily defined in terms of its structural location. For example, those elements (or classes of element) which appear as content of a div constitute the model.divPart class; those which appear as content of a bibl constitute the model.biblPart class. If, however, a model class has a name containing like, such as model.biblLike or model.nameLike, the implication is that its members all have some additional semantic property in common, for example containing a bibliographic description, or containing some form of name, respectively. These semantically-motivated classes often provide a useful way of dividing up large structurally-motivated classes: for example, the very general structural class model.pPart.data (‘data elements that form part of a paragraph’) has four semantically-motivated member classes (model.addressLike, model.dateLike, model.measureLike, and model.nameLike), the last of these being itself a superclass with several members.
Although most classes are defined by the tei infrastructure module, a class cannot be populated unless some other specific module is included in a schema, since element declarations are contained by modules. Classes are not declared ‘top down’, but instead gain their members as a consequence of individual elements' declaration of their membership. The same class may therefore contain different members, depending on which modules are active. Consequently, the content model of a given element (being expressed in terms of model classes) may differ depending on which modules are active.
Some classes contain only a single member, even when all modules are loaded. One reason for declaring such a class is to make it easier for a customization to add new member elements in a specific place, particularly in areas where the TEI does not make fully elaborated proposals. For example, the TEI class model.rdgLike, initially empty, is expanded by the textcrit module to include just the TEI rdg element. A project wishing to add an alternative way of structuring text-critical information could do so by defining their own elements and adding it to this class.
Another reason for declaring single-member classes is where the class members are not needed in all documents, but appear in the same place as elements which are very frequently required. For example, the specialized element g used to represent a non-Unicode character or glyph is provided as the only member of the model.gLike class when the gaiji module is added to a schema. References to this class are included in almost every content model, since if it is used at all the g must be available wherever text is available; however these references have no effect unless the gaiji module is loaded.
At the other end of the scale, a few of the classes predefined by the tei module are subsequently populated with very many members. For example, the class model.pPart groups all the classes of element which can appear within a p or paragraph element. The core module alone adds more than fifty elements to this class; the namesdates module adds another twenty, as does the tagdocs module. Since the p element is one of the basic building blocks of a TEI document it is not surprising that each module will need to add elements to it. The class system here provides a very convenient way of controlling the resulting complexity. Typically, elements are not added directly to these very general classes, but via some intermediate semantically-motivated class.
Just as there are a few classes which have a single member, so there are some classes which are used only once in the TEI architecture. These classes, which have no superclass and therefore do not fit into the class hierarchy defined here, are a convenient way of maintaining elements which are highly structured internally, but which appear from the outside to be uniform objects like others at the same level.2 Members of such classes can only ever appear within one element, or one class of elements. For example, the class model.addrPart is used only to express the content model for the element address; it references some other classes of elements, which can appear elsewhere, and also some elements which can only appear inside an address.
1.3.2.1 Basic Model ClassesTEI: Basic Model Classes¶
- divisions
- high level, possibly self-nesting, major divisions of texts. These elements populate the classes model.divLike, model.div1Like, etc.
- chunks
- elements such as paragraphs and other paragraph-level elements, which can appear directly within texts or within such divisions, but not within other chunks. These elements populate the class model.divPart, either directly or by means of other classes such as model.pLike (paragraph-like elements), model.entryLike, etc.
- phrase-level elements
- elements such as highlighted phrases, book titles, or editorial corrections which can occur only within chunks (paragraphs or paragraph-level elements), but not between them (and thus cannot appear directly within a division). These elements populate the class model.phrase.3
- inter-level elements
- elements such as lists, notes, quotations, etc. which can appear either between chunks (as children of a div) or within them; these elements populate the class model.inter. Note that this class is not a superset of the model.phrase and model.chunk classes but rather the group of elements which are both chunk-like and phrase-like; the classes model.phrase, model.pLike, and model.inter are all disjoint.
- components
- elements which can appear directly within texts or text divisions; this is a combination of the inter- and chunk- level elements defined above. These elements populate the class model.common, which is defined as a superset of the classes model.divPart, model.inter, and (when the dictionary module is included in a schema) model.entryLike.
As noted above, some elements and element classes belong to none of these groupings; however, over two-thirds of the 500+ elements defined in the present edition of these Guidelines are classified in this way. Future editions of these recommendations will extend and develop this classification scheme.
A complete alphabetical list of all model classes is provided in Model Classes.
1.4 MacrosTEI: Macros¶
The infrastructure module defined by this chapter also declares a number of macros, or shortcut names for frequently occurring parts of other declarations. Macros are used in two ways in the TEI scheme: to stand for frequently-encountered content models, or parts of content models (1.4.1 Standard Content Models); and to stand for attribute datatypes (1.4.2 Datatype Macros).
1.4.1 Standard Content ModelsTEI: Standard Content Models¶
- macro.paraContent (paragraph content) defines the content of paragraphs and similar elements.
- macro.limitedContent (paragraph content) defines the content of prose elements that are not used for transcription of extant materials.
- macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements.
- macro.phraseSeq.limited (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents.
- macro.schemaPattern provides a pattern to match elements from the chosen schema language
- macro.specialPara ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements.
- macro.xtext (extended text) defines a sequence of character data and gaiji elements.
Content model | Number of elements using this | Description |
macro.phraseSeq | 83 | any combination of text with elements from the model.gLike, model.global, or model.phrase classes |
macro.paraContent | 49 | macro.phraseSeq with the addition of model.inter |
empty | 39 | elements that have no content |
macro.specialPara | 24 | macro.paraContent with the addition of model.divPart |
macro.phraseSeq.limited | 24 | a subset of model.phraseSeq appropriate for use in non-transcriptional contexts |
text | 21 | plain untagged text |
macro.xtext | 19 | any combination of text with elements from the model.gLike class |
1.4.2 Datatype MacrosTEI: Datatype Macros¶
The values which attributes may take in a TEI schema are defined, for the most part, by reference to a TEI datatype. Each such datatype is defined in terms of other primitive datatypes, derived mostly from W3C Schema Datatypes, literal values, or other datatypes. This indirection makes it possible for a TEI application to set constraints either globally or in individual cases, by redefining the datatype definition or the reference to it respectively. In some cases, the TEI datatype includes additional usage constraints which cannot be enforced by existing schema languages, although a TEI-compliant processor should attempt to validate them (see further discussion in chapter 23.3 Conformance).
Where literal values or name tokens are used in a datatype definition, an associated value list supplies definitions for the significance of suggested or (in the case of closed lists) all possible values.
TEI-defined datatypes may be grouped into those which define normalized values for numeric quantities, probabilities, or temporal expressions, those which define various kinds of shorthand codes or keys, and those which define pointers or links.
- data.certainty defines the range of attribute values expressing a degree of certainty.
- data.probability defines the range of attribute values expressing a probability.
- data.numeric defines the range of attribute values used for numeric values.
- data.count defines the range of attribute values used for a non-negative integer value used as a count.
Examples of attributes using the data.probability datatype include degree on damage or certainty; examples of data.numeric include quantity on members of the att.measurement class or value on numeric; examples of data.count include cols on cell and table.
- data.duration.w3c defines the range of attribute values available for representation of a duration in time using W3C datatypes.
- data.duration.iso defines the range of attribute values available for representation of a duration in time using ISO 8601 standard formats
- data.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes specification.
- data.temporal.iso defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the international standard Data elements and interchange formats – Information interchange – Representation of dates and times.
- data.truthValue defines the range of attribute values used to express a truth value.
- data.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
- data.language defines the range of attribute values used to identify a particular combination of human language and writing system.
- data.sex defines the range of attribute values used to identify human or animal sex.
Note that in each of these cases the values used are those recommended by existing international standards: ISO 8601 as profiled by XML Schema Part 2: Datatypes Second Edition in the case of durations, times, and date; W3C Schema datatypes in the case of truth values; BCP 47 in the case of language; and ISO 5218 in the case of sex.
- data.outputMeasurement defines a range of values for use in specifying the size of an object that is intended for display on the web.
- data.namespace defines the range of attribute values used to indicate XML namespaces as defined by the W3C Namespaces in XML Technical Recommendation.
- data.pattern (regular expression pattern) defines attribute values which are expressed as a regular expression.
- data.point defines the data type used to express a point in cartesian space.
- data.pointer defines the range of attribute values used to provide a single URI pointer to any other resource, either within the current document or elsewhere.
- data.version defines the range of attribute values which may be used to specify a TEI version number.
- data.word defines the range of attribute values expressed as a single word or token.
- data.text defines the range of attribute values used to express some kind of identifying string as a single sequence of unicode characters possibly including whitespace.
- data.name defines the range of attribute values expressed as an XML Name.
- data.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
- data.code defines the range of attribute values expressing a coded value by means of a pointer to some other element which contains a definition for it.
Attributes of type data.word, such as age on person, are used to supply an identifier expressed as any kind of single token or word. The TEI places a few constraints on the characters which may be used for this purpose: only Unicode characters classified as letters, digits, punctuation characters, or symbols can appear in an attribute value of this kind. Note in particular that such values cannot include whitespace characters. Legal values include cholmondeley, été, 1234, e_content, or xml:id, but not grand wazoo. Attributes of this kind are sometimes used to associate (by co-reference) elements of different types.
Where identifiers are defined externally, for example as part of a
database or file system, the inability to include whitespace or other
special characters in a value may be problematic. In other cases, it
may also be simply more convenient to supply a short sequence of
natural language words including spaces as a single value. For these
reasons, we also provide a datatype data.text which
does permit whitespace and indeed any other Unicode character. Legal
values include cholmondeley, été,
1234, e-content, xml:id, and
grand wazoo. This datatype should be used with care since
XML will not normalise whitespace characters within it: for example
the values n="a b"
(two spaces) and n="a b"
(three spaces) would be considered distinct. This case should be
distinguished from that of an attribute permitting multiple values,
each of which may be separated by whitespace which will
be normalised (see further 22.4.5.1 Datatypes).
Attributes of type data.name are similar to those of type data.word, but with the additional constraint that they must be legal XML identifiers, as defined by the XML 1.0 specification, or successors. Hence, they may not begin with digits or punctuation characters. Legal identifiers include cholmondeley, été, e_content, or xml:id, but not grand wazoo or 1234. Attributes of this kind are typically used to represent XML element or attribute names.
Attributes of type data.enumerated, such as new on shift or evidence supplied by att.editLike, have the same definition as data.word above, with the added constraint that the word supplied is taken from a specific list of possibilities. In each case, the element or class specification which includes the definition for the attribute will also contain a list of possible values, together with a prose description of their intended significance. This list may be open (in which case the list is advisory), or closed (in which case it determines the range of legal values). In this latter case, the datatype will not be data.enumerated, but an explicit list of the possible values.
Attributes of type data.code are similar in function, in that they also supply encoded names for values which are defined in more detail elsewhere. In this case, however, the full definition is supplied as content of another XML element, typically but not necessarily in the same document, and it is referenced by means of a pointer.
An attribute may, of course, take more than one value of a given type, for example a list of pointer values, or a list of words. In the TEI scheme, this information is regarded as a property of the datatype element used to document the attribute in question rather than as a distinct ‘datatype’. See further 22.4.5.1 Datatypes.
1.5 The TEI Infrastructure ModuleTEI: The TEI Infrastructure Module¶
- Modul tei: Declarations for classes, datatypes, and macros available to all TEI modules
- Definierte Klassen: att.ascribed att.breaking att.canonical att.damaged att.datable att.datable.w3c att.datcat att.declarable att.declaring att.dimensions att.divLike att.docStatus att.duration.w3c att.editLike att.global att.handFeatures att.internetMedia att.interpLike att.measurement att.naming att.personal att.placement att.pointing att.pointing.group att.ranging att.readFrom att.responsibility att.scoping att.segLike att.sortable att.sourced att.spanning att.tableDecoration att.timed att.transcriptional att.translatable att.typed model.addrPart model.addressLike model.applicationLike model.availabilityPart model.biblLike model.biblPart model.castItemPart model.catDescPart model.certLike model.choicePart model.common model.dateLike model.dimLike model.div1Like model.div2Like model.div3Like model.div4Like model.div5Like model.div6Like model.div7Like model.divBottom model.divBottomPart model.divGenLike model.divLike model.divPart model.divTop model.divTopPart model.divWrapper model.editorialDeclPart model.egLike model.emphLike model.encodingDescPart model.entryPart model.entryPart.top model.featureVal model.featureVal.complex model.featureVal.single model.frontPart model.frontPart.drama model.gLike model.global model.global.edit model.global.meta model.glossLike model.graphicLike model.headLike model.hiLike model.highlighted model.imprintPart model.inter model.lLike model.lPart model.labelLike model.limitedPhrase model.linePart model.listLike model.measureLike model.milestoneLike model.msItemPart model.msQuoteLike model.nameLike model.nameLike.agent model.noteLike model.oddDecl model.oddRef model.offsetLike model.orgPart model.orgStateLike model.pLike model.pLike.front model.pPart.data model.pPart.edit model.pPart.editorial model.pPart.msdesc model.pPart.transcriptional model.persEventLike model.persStateLike model.personLike model.personPart model.phrase model.phrase.xml model.physDescPart model.placeEventLike model.placeLike model.placeNamePart model.placeStateLike model.profileDescPart model.ptrLike model.publicationStmtPart model.qLike model.quoteLike model.resourceLike model.respLike model.segLike model.settingPart model.sourceDescPart model.specDescLike model.stageLike model.teiHeaderPart model.textDescPart model.titlepagePart
- Definierte Makros: data.certainty data.code data.count data.duration.iso data.duration.w3c data.enumerated data.language data.name data.namespace data.numeric data.outputMeasurement data.pattern data.point data.pointer data.probability data.sex data.temporal.iso data.temporal.w3c data.text data.truthValue data.version data.word data.xTruthValue macro.limitedContent macro.paraContent macro.phraseSeq macro.phraseSeq.limited macro.specialPara macro.xtext
The order in which declarations are made within the infrastructure module is critical, since several class declarations refer to others, which must therefore precede them. Other constraints on the order of declarations derive from the way in which the modularity of the TEI scheme is implemented in different schema languages. The XML DTD fragment implementing this TEI module makes extensive use of parameter entities and marked sections to effect a kind of conditional construction; the RELAX NG schema fragment similarly predeclares a number of patterns with null (‘notAllowed’) values. These issues are further discussed in chapter 23.4 Implementation of an ODD System.