1 The TEI Infrastructure
目次
This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented. It assumes some familiarity with XML and XML schemas (see chapter v. A Gentle Introduction to XML) but is intended to be accessible to any user of these Guidelines. Other chapters supply further technical details, in particular chapter 22 Documentation Elements which describes the XML schema used to express the Guidelines themselves, and chapter 23 Using the TEI which combines a discussion of modification and conformance issues with a description of the intended behaviour of an ODD processor; these chapters should be read by anyone intending to implement a new TEI-based system.
The TEI encoding scheme consists of a number of modules, each of which declares particular XML elements and their attributes. Part of an element's declaration includes its assignment to one or more element classes. Another part defines its possible content and attributes with reference to these classes. This indirection gives the TEI system much of its strength and its flexibility. Elements may be combined more or less freely to form a schema appropriate to a particular set of requirements. It is also easy to add new elements which reference existing classes or elements to a schema, as it is to exclude some of the elements provided by any module included in a schema.
In principle, a TEI schema may be constructed using any combination of modules. However, certain TEI modules are of particular importance, and should always be included in all but exceptional circumstances: the module tei described in the present chapter is of this kind because it defines classes, macros, and datatypes which are used by all other modules. The core module, defined in chapter 3 Elements Available in All TEI Documents contains declarations for elements and attributes which are likely to be needed in almost any kind of document, and is therefore recommended for global use. The header module defined in chapter 2 The TEI Header provides declarations for the metadata elements and attributes constituting the TEI Header, a component which is required for TEI conformance, while the textstructure module defined in chapter 4 Default Text Structure declares basic structural elements needed for the encoding of most book-like objects. Most schemas will therefore need to include these four modules.
The specification for a TEI schema is itself a TEI document, using elements from the module described in chapter 22 Documentation Elements: we refer to such a document informally as an ODD document, from the design goal originally formulated for the system: ‘One Document Does it all’. Stylesheets for maintaining and processing ODD documents are maintained by the TEI, and these Guidelines are also maintained as such a document. As further discussed in 23.5 Implementation of an ODD System, an ODD document can be processed to generate a schema expressed using any of the three schema languages currently in wide use: the XML DTD language, the ISO RELAX NG language, or the W3C Schema language, as well as to generate documentation such as the Guidelines and their associated web site.
The bulk of this chapter describes the TEI infrastructure module itself. Although it may be skipped at a first reading, an understanding of the topics addressed here is essential for anyone planning to take full advantage of the TEI customization techniques described in chapter 23.3 Personalization and Customization.
The chapter begins by briefly characterizing each of the modules available in the TEI scheme. Section 1.2 Defining a TEI Schema describes in general terms the method of constructing a TEI schema in a specific schema language such as XML DTD language, RELAX NG, or W3C Schema.
The next and largest part of the chapter introduces the attribute and element classes used to define groups of elements and their characteristics (section 1.3 The TEI Class System).
Finally, section 1.4 Macros introduces the concept of macros, which are used to express some commonly used content models, and lists the datatypes used to constrain the range of legal values for TEI attributes (section 1.4.2 Datatype Macros).
1.1 TEI Modules TEI Modules¶
- a prose description
- a formal declaration, expressed using a special-purpose XML vocabulary defined by these Guidelines in combination with elements taken from the ISO schema language RELAX NG
- usage examples
Each chapter of the Guidelines presents a group of related elements, and also defines a corresponding set of declarations, which we call a module. All the definitions are collected together in the reference sections provided as an appendix. Formal declarations for a given chapter are collected together within the corresponding module. For convenience, each element is assigned to a single module, typically for use in some specific application area, or to support a particular kind of usage. A module is thus simply a convenient way of grouping together a number of associated element declarations. In the simple case, a TEI schema is made by combining together a small number of modules, as further described in section 1.2 Defining a TEI Schema below.
Module name | Formal public identifier | Where defined |
analysis | Analysis and Interpretation | 17 Simple Analytic Mechanisms |
certainty | Certainty and Uncertainty | 21 Certainty, Precision, and Responsibility |
core | Common Core | 3 Elements Available in All TEI Documents |
corpus | Metadata for Language Corpora | 15 Language Corpora |
dictionaries | Print Dictionaries | 9 Dictionaries |
drama | Performance Texts | 7 Performance Texts |
figures | Tables, Formulae, Figures | 14 Tables, Formulæ, Graphics and Notated Music |
gaiji | Character and Glyph Documentation | 5 Representation of Non-standard Characters and Glyphs |
header | Common Metadata | 2 The TEI Header |
iso-fs | Feature Structures | 18 Feature Structures |
linking | Linking, Segmentation, and Alignment | 16 Linking, Segmentation, and Alignment |
msdescription | Manuscript Description | 10 Manuscript Description |
namesdates | Names, Dates, People, and Places | 13 Names, Dates, People, and Places |
nets | Graphs, Networks, and Trees | 19 Graphs, Networks, and Trees |
spoken | Transcribed Speech | 8 Transcriptions of Speech |
tagdocs | Documentation Elements | 22 Documentation Elements |
tei | TEI Infrastructure | 1 The TEI Infrastructure |
textcrit | Text Criticism | 12 Critical Apparatus |
textstructure | Default Text Structure | 4 Default Text Structure |
transcr | Transcription of Primary Sources | 11 Representation of Primary Sources |
verse | Verse | 6 Verse |
For each module listed above, the corresponding chapter gives a full description of the classes, elements, and macros which it makes available when it is included in a schema. Other chapters of these Guidelines explore other aspects of using the TEI scheme.
1.2 Defining a TEI Schema Defining a TEI Schema¶
To determine that an XML document is valid (as opposed to merely well-formed), its structure must be checked against a schema, as discussed in chapter v. A Gentle Introduction to XML. For a valid TEI document, this schema must be a conformant TEI schema, as further defined in chapter 23.4 Conformance. Local systems may allow their schema to be implicit, but for interchange purposes the schema associated with a document must be made explicit. The method of doing this recommended by these Guidelines is to provide explicitly or by reference a TEI schema specification against which the document may be validated.
A TEI-conformant schema is a specific combination of TEI modules, possibly also including additional declarations that modify the element and attribute declarations contained by each module, for example to suppress or rename some elements. The TEI provides an application-independent way of specifying a TEI schema by means of the schemaSpec element defined in chapter 22 Documentation Elements. The same system may also be used to specify a schema which extends the TEI by adding new elements explicitly, or by reference to other XML vocabularies. In either case, the specification may be processed to generate a formal schema, expressed in a variety of specific schema languages, such as XML DTD language, RELAX NG, or W3C Schema. These output schemas can then be used by an XML processor such as a validator or editor to validate or otherwise process documents. Further information about the processing of a TEI formal specification is given in chapter 23 Using the TEI.
1.2.1 A Simple Customization A Simple Customization¶
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
</schemaSpec>
This schema specification contains references to each of four modules, identified by the key attribute on the moduleRef element. The schema specification itself is also given an identifier (TEI-minimal). An ODD processor will generate an appropriate schema from this set of declarations, expressed using the XML DTD language, the ISO RELAX NG language, the W3C Schema language, or in principle any other adequately powerful schema language. The resulting schema may then be associated with the document instance by one of a number of different mechanisms, as further described in chapter v. A Gentle Introduction to XML. The start point (or root element) of document instances to be validated against the schema is specified by means of the start attribute. Further information about the processing of an ODD specification is given in 23.5 Implementation of an ODD System.
1.2.2 A Larger Customization A Larger Customization¶
- a text may be a collection of other texts of different types: for example, an anthology of prose, verse, and drama;
- a text may contain other smaller, embedded texts: for example, a poem or song included in a prose narrative;
- some sections of a text may be written in one form, and others in a different form: for example, a novel where some chapters are in prose, others take the form of dictionary entries, and still others the form of scenes in a play;
- an encoded text may include detailed analytic annotation, for example of rhetorical or linguistic features;
- an encoded text may combine a literal transcription with a diplomatic edition of the same or different sources;
- the description of a text may require additional specialized metadata elements, for example when describing manuscript material in detail.
- a definition of a corpus or collection as a series of TEI documents, sharing a common TEI header (see chapter 15 Language Corpora)
- a definition of composite texts which combine optional front- and back-matter with a group of collected texts, themselves possibly composite (see section 4.3.1 Grouped Texts)
- an element for the representation of embedded texts, where one narrative appears to ‘float’ within another (see section 4.3.2 Floating Texts)
Subsequent chapters of these Guidelines describe in detail markup constructs appropriate for these and many other possible features of interest. The markup constructs can be combined as needed for any given set of applications or project.
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="msdescription"/>
<!-- manuscript description -->
<moduleRef key="transcr"/>
<!-- transcription of primary sources -->
<moduleRef key="figures"/>
<!-- figures and tables -->
<moduleRef key="namesdates"/>
<!-- names, dates, people, and places -->
</schemaSpec>
<moduleRef key="tei"/>
<moduleRef key="core"/>
<moduleRef key="textstructure"/>
<moduleRef key="transcr"/>
</schemaSpec>
The TEI architecture also supports more detailed customization beyond the simple selection of modules. A schema may suppress elements from a module, suppress some of their attributes, change their names, or even add new elements and attributes. Detailed discussion of the kind of modification possible in this way is provided in 23.3 Personalization and Customization and conformance rules relating to their application are discussed in 23.4 Conformance. These facilities are available for any schema language (though some features may not be available in all languages). The ODD language also makes it possible to combine TEI and non-TEI modules into a single schema, provided that the non-TEI module is expressed using the RELAX NG schema language (see further 22.6 Combining TEI and Non-TEI Modules).
1.3 The TEI Class System The TEI Class System¶
The TEI scheme distinguishes about five hundred different elements. To aid comprehension, modularity, and modification, the majority of these elements are formally classified in some way. Classes are used to express two distinct kinds of commonality among elements. The elements of a class may share some set of attributes, or they may appear in the same locations in a content model. A class is known as an attribute class if its members share attributes, and as a model class if its members appear in the same locations. In either case, an element is said to inherit properties from any classes of which it is a member.
Classes (and therefore elements which are members of those classes) may also inherit properties from other classes. For example, supposing that class A is a member (or a subclass) of class B, any element which is a member of class A will inherit not only the properties defined by class A, but also those defined by class B. In such a situation, we also say that class B is a superclass of class A. The properties of a superclass are inherited by all members of its subclasses.
A basic understanding of the classes into which the TEI scheme is organized is strongly recommended and is essential for any successful customization of the system.
1.3.1 Attribute Classes Attribute Classes¶
An attribute class groups together elements which share some set of common attributes.
Attribute classes are given names composed of the prefix att.
, often followed
by an adjective. For example, the members of the class att.canonical have in common a key and a ref attribute,
both of which are inherited from their membership in the class rather than individually
defined for each element. These attributes are said to be defined by (or inherited from) the
att.canonical class. If another element were to be added to
the TEI scheme for which these attributes were considered useful, the simplest way to
provide them would be to make the new element a member of the att.canonical class. Note also that this method ensures that the attributes in
question are always defined in the same way, taking the same default values etc., no matter
which element they are attached to.
Some attribute classes are defined within the tei infrastructural module and are thus globally available. Other attribute classes are specific to particular modules and thus defined in other chapters. Attributes defined by such classes will not be available unless the module concerned is included in a schema.
The attributes provided by an attribute class are those specified by the class itself, either directly, or by inheritance from another class. For example, the attribute class att.pointing.group provides attributes domains and targFunc to all of its members. This class is however a subclass of the att.pointing class, from which its members also inherit the attributes target, targetLang and evaluate. Members of the class att.pointing will thus have these three attributes, while members of the class att.pointing.group will have all five.
Note that some modules define superclasses of an existing infrastructural class. For example, the global attribute class att.divLike makes attributes org, part, and sample available, while the att.metrical class, which is specific to the verse module, provides attributes met, real, and rhyme. Because att.metrical is defined as a superclass of att.divLike, all six of these attributes are available to elements; the declaration for att.metrical adds its three attributes to the three already defined by att.divLike when the verse module is included in a schema. If, however, this module is not included in a schema, then the att.divLike class supplies only the three attributes first mentioned.
Attributes specific to particular modules are documented along with the relevant module rather than in the present chapter. One particular attribute class, known as att.global, is common to all modules, and is therefore described in some detail in the next section. A full list of all attribute classes is given in Attribute Classes below.
1.3.1.1 Global Attributes Global Attributes¶
- att.global TEI符号化スキーム中の全要素に共通する属性を示す.
xml:id (identifier) 当該要素にユニークな識別子を示す. n (number) 要素に数値やラベルを与える.これは当該文書中でユニークである必要 はない. xml:lang (language) 当該要素の内容で使用されている言語を, BCP 47 に準拠して作られた‘タグ’で示す. rend (rendition) 当該要素が,元資料でどのように表示されていたかを示す. style contains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text rendition 当該要素が示す表現が現れている,元資料のテキスト部分を示す. xml:base 相対URIから絶対URIを構成する際に必要なベースURIを示す. xml:space 空白文字類をアプリケーションがどう解釈するかを示す.
These attributes are optionally available for any TEI element; none of them is required. Their usage is discussed in the following subsections.
1.3.1.1.1 Element Identifiers and Labels Element Identifiers and Labels¶
The value supplied for the xml:id attribute must be a legal name, as defined in the World Wide Web Consortium's XML Recommendation. This means that it must begin with a letter, or the underscore character (‘_’), and contain no characters other than letters, digits, hyphens, underscores, full stops, and certain combining and extension characters.1
In XML names (and thus the values of xml:id in an XML TEI document) uppercase and lowercase letters are distinguished, and thus partTime and parttime are two distinctly different names, and could (though perhaps unwisely) be used to denote two different element occurrences.
xml:id="PAGE1"><q>What's it going to be then, eh?</q></p> <p
xml:id="PAGE1">There was me, that is Alex, and my three droogs, that is Pete,
Georgie, and Dim, ... </p>
For a discussion of methods of providing unique identifiers for elements, see section 3.10.2 Creating New Reference Systems.
<item n="1">About These Guidelines</item>
<item n="2">A Gentle Introduction to XML</item>
<item n="9">Verse</item>
<item n="10">Drama</item>
<item n="10">Spoken Materials </item>
<item n="12">Dictionaries</item>
</list>
<!-- ... -->
<div type="stanza" n="xlii">
<!-- ... -->
</div>
</div>
<!-- ... -->
</l>
<l n="2">
<!-- ... -->
</l>
<l n="3">
<!-- ... -->
</l>
<!-- ... -->
<l n="100">
<!-- ... -->
</l>
1.3.1.1.2 Language Indicators Language Indicators¶
The xml:lang attribute indicates the natural language and writing system applicable to the content of a given element. If it is not specified, the value is inherited from that of the immediately enclosing element. As a rule, therefore, it is simplest to specify the base language of the text on the TEI element, and allow most elements to take the default value for xml:lang; the language of an element then need be explicitly specified only for elements in languages other than the base language. For this reason, it is recommended practice to supply a default value for the xml:lang attribute, either on the TEI root element, or on both the teiHeader and the text element. The latter is appropriate in the not uncommon case where the text element in a TEI document uses a different default language from that of the TEI Header attached to it. Other language shifts in the source should be explicitly identified by use of the xml:lang attribute on an element at an appropriate level wherever possible.
<teiHeader>
<!-- ... -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="en">
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="fr">
<!-- ... -->
</text>
</TEI>
<teiHeader xml:lang="en">
<!-- ... -->
</teiHeader>
<text xml:lang="fr">
<body>
<div>
<!-- chapter one is in French -->
</div>
<div xml:lang="de">
<!-- chapter two is in German -->
</div>
<div>
<!-- chapter three is French -->
</div>
<!-- ... -->
</body>
</text>
</TEI>
constitution declares <q>that no bill of attainder or <term xml:lang="la">ex post
facto</term> law shall be passed.</q> ...</p>
The values used for the xml:lang and targetLangattributes must be constructed in a particular way, using values from standard lists. See further vi.1. Language identification.
Additional information about a particular language may be supplied in the language element within the header (see section 2.4.2 Language Usage).
1.3.1.1.3 Rendition Indicators Rendition Indicators¶
and pious; but he was equally alarmed by his knowledge of the ambitious <name rend="italics">Bohemond</name>, and his ignorance of the Transalpine chiefs:
...</p>
the ambitious <name style="font-style: italic">Bohemond</name>, and his ignorance of
the Transalpine chiefs: ...</p>
The main difference between rend attribute and style is that the value used for the former may contain one or more tokens from any vocabulary devised by the encoder, separated by space characters, whereas the value used for the latter must be a single string taken from a formally-defined style definition language such as CSS. The rend attribute values are sequence-indeterminate set of whitespace-separated tokens, whereas style values allow whitespace and sequence relationships as part of the formally-defined style definition language.
<!-- define italic style using CSS -->
<rendition xml:id="IT" scheme="css">font-style: italic</rendition>
<!-- define a serif font family -->
<rendition xml:id="FontRoman" scheme="css">font-family: serif</rendition>
<!-- set italic style as default for the emph and hi elements -->
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="emph" render="#IT"/>
<tagUsage gi="hi" render="#IT"/>
<!-- set the default font-family for the text element -->
<tagUsage gi="text" render="#FontRoman"/>
</namespace>
</tagsDecl>
<!-- ... -->
<text>
<body>
<div>
<p rendition="#IT">
<!-- this paragraph uses the seriffed font, but is in italic-->
</p>
<p>
<!-- this paragraph uses the seriffed font, but is not in italic -->
</p>
</div>
</body>
</text>
The rendition attribute always points to one or more rendition elements, each of which defines some aspect of the rendering or appearance of the text in its original form. These details may most conveniently be described using a formal style definition language, such as CSS (Lie and Bos (eds.) (1999)) or XSL-FO (Berglund (ed.) (2006)); in some other formal language developed for a specific project; or even informally in running prose. Although languages such as CSS and XSL-FO are generally used to describe document output to screen or print, they nonetheless provide formal and precise mechanisms for describing the appearance of source documents, especially print documents, but also many aspects of manuscript documents. For example, both CSS and XSL-FO provide mechanisms for describing typefaces, weight, and styles; character and line spacing; and so on.
As noted above, the style attribute is provided for encoders wishing to describe the appearance of individual source elements using a language such as CSS directly rather than by reference to a rendition element. Its value may be any expression in the chosen formal style definition language.
Formal definition languages such as CSS typically identity a series of properties (such as font-style or margin-left) for which values are specified. A sequence of such property-value pairs makes up a stylesheet. The TEI uses such languages simply to describe the appearance of a source document, rather than to control how it should be formatted.
- One or more properties may be specified as the default for all elements of a given type, using the render attribute to point to rendition elements ;
- One or more properties may be specified for individual element occurrences, using the rend attribute with any convenient set of one or more sequence-indeterminate tokens;
- One or more properties may be specified for individual element occurrences, using the rendition attribute to point to rendition elements;
- One or more properties may be supplied explicitly for individual element occurrences, using the style attribute.
If the same property is specified in more than one of the above ways, the one with the highest number in the list above is understood to be applicable. The resulting properties from each way are then combined to provide the full set of property-value pairs applicable to the given element, and (by default) to all of its children.
For simplicity of processing, the same formal style definition should be used throughout; however, the architecture does permit this to be varied, by using the scheme attribute to indicate a different language for one or more rendition elements. Care should be taken to ensure that such values can be meaningfully combined. Similar considerations apply to the use of the rend attribute, if this is used in combination with either rendition or style.
Note that these TEI attributes always describe the rendition or appearance of the source document, not intended output renditions, although often the two may be closely related.
1.3.1.1.4 Other global attributes Other global attributes¶
The global attributes xml:base and xml:space are also provided by default in any TEI schema. Like xml:id these attributes are defined as part of the XML specification and belong to the XML namespace rather than the TEI namespace. We do not describe them in detail here: reference information for xml:base is provided by Marsh (ed.) (2001); for xml:space by the section 2.10 of the XML Specification.
<div
xml:base="http://www.example.org/somewhere.xml">
<p>
<!--... -->
<ptr target="#p1"/>
<!--... -->
</p>
</div>
<div>
<p>
<!--... -->
<ptr target="#p1"/>
<!--... -->
</p>
</div>
</body>
http://www.example.org/somewhere.xml#p1
. The second
ptr, however, is within the scope of a div which does not change the
default context, and its target is therefore some element within the current document
with the value p1
for its xml:id attribute. Further discussion
of this element and its effect on TEI linking methods is provided in chapter 16 Linking, Segmentation, and Alignment. The XML Recommendation defines whitespace as a single term for the space, tab, and linebreak characters which may appear in a document. By default, XML processors treat whitespace in predictable ways, depending on where it occurs:
- When whitespace characters occur as part of a text node, within the content of an element, XML generally considers them significant and requires that a processor preserve all of them.
- When whitespace characters occur within an element that contains mixed content, that is, an element that contains both element and text nodes, XML assumes that they are significant and requires that a processor preserve all of them.
- When whitespace characters occur between elements (not within the content of those elements or mixed with text), XML generally assumes that they are not significant and may be ignored by an XML processor. This kind of whitespace is most commonly introduced by an encoder or by XML editing software to enhance the readability of the displayed text. This should only happen at locations where the whitespace can be understood as insignificant (so there is no conflict with significant whitespace), but not all processors can detect this reliably.
<item>apple pie</item>
<item>banana custard</item>
<item>carrot cake</item>
</list>
The xml:space attribute is available if it is necessary to modify the default treatment of whitespace. This attribute is defined in section 2.10 of the XML Specification. However, it is rarely necessary to do this: most TEI elements permit mixed content, and consequently the presence or absence of whitespace is usually significant in a TEI document. In most cases where whitespace may be desired in the output, this should be indicated using native TEI elements (such as l) to convey the structure of the text, with whitespace for display introduced in processing, rather than by introducing whitespace into the text and using preserve on xml:space. It is worth noting that while the value of preserve on xml:space indicates the encoder's intention that whitespace be preserved, not all applications will obey this.
There are a few situations in which it may be essential to use preserve on xml:space, typically where complex markup is being used within the context of a tool that by default introduces whitespace in order to enhance display of the text. For example, when transcribing an inscription with the elements described in chapter 11 Representation of Primary Sources, a single word may well gain several additional tags to mark parts of the word which are supplied or conjectural. Such tags do not interrupt the word however, and hence introducing space where they occur would be misleading. The value of preserve for the xml:space attribute on the parent div element may be used to indicate that all and only the spaces actually present in the XML source should be regarded as significant; an XML editor or other processor is not then permitted to introduce additional spaces.
1.3.2 Model Classes Model Classes¶
As noted above, the members of a given TEI model class share the property that they can all appear in the same location within a document. Wherever possible, the content model of a TEI element is expressed not directly in terms of specific elements, but indirectly in terms of particular model classes. This makes content models simpler and more consistent; it also makes them much easier to understand and to modify.
Like attribute classes, model classes may have subclasses or superclasses. Just as elements inherit from a class the ability to appear in certain locations of a document (wherever the class can appear), so all members of a subclass inherit the ability to appear wherever any superclass can appear. To some extent, the class system thus provides a way of reducing the whole TEI galaxy of elements into a tidy hierarchy. This is however not entirely the case.
In fact, the nature of a given class of elements can be considered along two dimensions: as noted, it defines a set of places where the class members are permitted within the document hierarchy; it also implies a semantic grouping of some kind. For example, the very large class of elements which can appear within a paragraph comprises a number of other classes, all of which have the same structural property, but which differ in their field of application. Some are related to highlighting, while others relate to names or places, and so on. In some cases, the ‘set of places where class members are permitted’ is very constrained: it may just be within one specific element, or one class of element, for example. In other cases, elements may be permitted to appear in very many places, or in more than one such set of places.
These factors are reflected in the way that model classes are named. If a model class has a name containing part, such as model.divPart or model.biblPart then it is primarily defined in terms of its structural location. For example, those elements (or classes of element) which appear as content of a div constitute the model.divPart class; those which appear as content of a bibl constitute the model.biblPart class. If, however, a model class has a name containing like, such as model.biblLike or model.nameLike, the implication is that its members all have some additional semantic property in common, for example containing a bibliographic description, or containing some form of name, respectively. These semantically-motivated classes often provide a useful way of dividing up large structurally-motivated classes: for example, the very general structural class model.pPart.data (‘data elements that form part of a paragraph’) has four semantically-motivated member classes (model.addressLike, model.dateLike, model.measureLike, and model.nameLike), the last of these being itself a superclass with several members.
Although most classes are defined by the tei infrastructure module, a class cannot be populated unless some other specific module is included in a schema, since element declarations are contained by modules. Classes are not declared ‘top down’, but instead gain their members as a consequence of individual elements' declaration of their membership. The same class may therefore contain different members, depending on which modules are active. Consequently, the content model of a given element (being expressed in terms of model classes) may differ depending on which modules are active.
Some classes contain only a single member, even when all modules are loaded. One reason for declaring such a class is to make it easier for a customization to add new member elements in a specific place, particularly in areas where the TEI does not make fully elaborated proposals. For example, the TEI class model.rdgLike, initially empty, is expanded by the textcrit module to include just the TEI rdg element. A project wishing to add an alternative way of structuring text-critical information could do so by defining their own elements and adding it to this class.
Another reason for declaring single-member classes is where the class members are not needed in all documents, but appear in the same place as elements which are very frequently required. For example, the specialized element g used to represent a non-Unicode character or glyph is provided as the only member of the model.gLike class when the gaiji module is added to a schema. References to this class are included in almost every content model, since if it is used at all the g must be available wherever text is available; however these references have no effect unless the gaiji module is loaded.
At the other end of the scale, a few of the classes predefined by the tei module are subsequently populated with very many members. For example, the class model.pPart groups all the classes of element which can appear within a p or paragraph element. The core module alone adds more than fifty elements to this class; the namesdates module adds another twenty, as does the tagdocs module. Since the p element is one of the basic building blocks of a TEI document it is not surprising that each module will need to add elements to it. The class system here provides a very convenient way of controlling the resulting complexity. Typically, elements are not added directly to these very general classes, but via some intermediate semantically-motivated class.
Just as there are a few classes which have a single member, so there are some classes which are used only once in the TEI architecture. These classes, which have no superclass and therefore do not fit into the class hierarchy defined here, are a convenient way of maintaining elements which are highly structured internally, but which appear from the outside to be uniform objects like others at the same level.2 Members of such classes can only ever appear within one element, or one class of elements. For example, the class model.addrPart is used only to express the content model for the element address; it references some other classes of elements, which can appear elsewhere, and also some elements which can only appear inside an address.
1.3.2.1 Informal element classifications Informal element classifications ¶
- divisions
- high level, possibly self-nesting, major divisions of texts. These elements populate such classes as model.divLike or model.div1Like, and typically form the largest component units of a text.
- chunks
- elements such as paragraphs and other paragraph-level elements, which can appear directly within texts or within divisions of them, but not (usually) within other chunks. These elements populate the class model.divPart, either directly or by means of other classes such as model.pLike (paragraph-like elements), model.entryLike, etc.
- phrase-level elements
- elements such as highlighted phrases, book titles, or editorial corrections which can occur only within chunks, but not between them (and thus cannot appear directly within a division). These elements populate the class model.phrase.3
- inter-level elements
- elements such as lists, notes, quotations, etc. which can appear either between chunks (as children of a div) or within them; these elements populate the class model.inter. Note that this class is not a superset of the model.phrase and model.chunk classes but rather a distinct grouping of elements which are both chunk-like and phrase-like. However, the classes model.phrase, model.pLike, and model.inter are all disjoint.
- components
- elements which can appear directly within texts or text divisions; this is a combination of the inter- and chunk- level elements defined above. These elements populate the class model.common, which is defined as a superset of the classes model.divPart, model.inter, and (when the dictionary module is included in a schema) model.entryLike.
As noted above, some elements do not belong to any model class, and some model classes are not readily associated with any of the above informal groupings. However, over two-thirds of the 500+ elements defined in the present edition of these Guidelines are classified in this way, and future editions of these recommendations will extend and develop this classification scheme.
A complete alphabetical list of all model classes is provided in Model Classes.
1.4 Macros Macros¶
The infrastructure module defined by this chapter also declares a number of macros, or shortcut names for frequently occurring parts of other declarations. Macros are used in two ways in the TEI scheme: to stand for frequently-encountered content models, or parts of content models (1.4.1 Standard Content Models); and to stand for attribute datatypes (1.4.2 Datatype Macros).
1.4.1 Standard Content Models Standard Content Models¶
- macro.paraContent (paragraph content) 段落やそれ相当の要素の内容を定義する.
- macro.limitedContent (paragraph content) 現存する資料の転記で使われるものではない散文要素の内容を定義する.
- macro.phraseSeq (phrase sequence) 一連の文字列と句レベル要素を定義する.
- macro.phraseSeq.limited (limited phrase sequence) 一般には,現存資料の転記に使われることはない,一連の文字列と句レベル の要素を定義する.
- macro.schemaPattern 選択されたスキーマ言語による,要素にマッチするパタン.
- macro.specialPara ('special' paragraph content) 一連の句レベルまたは挿入レベルの要素と共に,一連の構成要素レベルの要 素,または段落相当の構造を持つ,注釈やリスト項目となる要素の内容モデ ルを定義する.
- macro.xtext (extended text) 一連の文字列や外字要素を定義する.
Content model | Number of elements using this | Description |
macro.phraseSeq | 83 | any combination of text with elements from the model.gLike, model.global, or model.phrase classes |
macro.paraContent | 49 | macro.phraseSeq with the addition of model.inter |
empty | 39 | elements that have no content |
macro.specialPara | 24 | macro.paraContent with the addition of model.divPart |
macro.phraseSeq.limited | 24 | a subset of model.phraseSeq appropriate for use in non-transcriptional contexts |
text | 21 | plain untagged text |
macro.xtext | 19 | any combination of text with elements from the model.gLike class |
1.4.2 Datatype Macros Datatype Macros¶
The values which attributes may take in a TEI schema are defined, for the most part, by reference to a TEI datatype. Each such datatype is defined in terms of other primitive datatypes, derived mostly from W3C Schema Datatypes, literal values, or other datatypes. This indirection makes it possible for a TEI application to set constraints either globally or in individual cases, by redefining the datatype definition or the reference to it respectively. In some cases, the TEI datatype includes additional usage constraints which cannot be enforced by existing schema languages, although a TEI-compliant processor should attempt to validate them (see further discussion in chapter 23.4 Conformance).
Where literal values or name tokens are used in a datatype definition, an associated value list supplies definitions for the significance of suggested or (in the case of closed lists) all possible values.
TEI-defined datatypes may be grouped into those which define normalized values for numeric quantities, probabilities, or temporal expressions, those which define various kinds of shorthand codes or keys, and those which define pointers or links.
- data.certainty 確信度を示す属性値の程度を示す.
- data.probability 出現度を示す属性値の範囲を定義する.
- data.numeric 数値をとる属性値の範囲を定義する.
- data.count 非負整数値を採る属性値の範囲を定義する.
Examples of attributes using the data.probability datatype include degree on damage or certainty; examples of data.numeric include quantity on members of the att.measurement class or value on numeric; examples of data.count include cols on cell and table.
- data.duration.w3c W3Cのデータ型を使い,時間幅を表現する当該属性値の範囲を定義する.
- data.duration.iso ISO 8601にある標準形式を使い,時間幅を表現する当該属性値の範囲を定義 する.
- data.temporal.w3c 日付や時間などの時間表現をとる属性値の範囲を定義する.これは,W3Cの XML Schema Part 2: Datatypesに従ったものになる.
- data.temporal.iso 日付や時間などの時間表現をとる属性値の範囲を定義する.これは,国際標 準であるData elements and interchange formats - Information interchange - Representation of dates and timesに準拠したも のになる.
- data.truthValue 真偽値を示す属性値の範囲を定義する.
- data.xTruthValue (extended truth value) 不明の場合もある真偽値をとる属性値の範囲を定義する.
- data.language 自然言語を示す属性値の範囲を定義する.
- data.sex 人間または動物の性を示す属性値の範囲を定義する.
Note that in each of these cases the values used are those recommended by existing international standards: ISO 8601 as profiled by XML Schema Part 2: Datatypes Second Edition in the case of durations, times, and date; W3C Schema datatypes in the case of truth values; BCP 47 in the case of language; and ISO 5218 in the case of sex.
- data.outputMeasurement webページ上で表示する際の大きさを定義する値の範囲を定義する.
- data.namespace W3Cの XML名前空間で定義されている名前空間を示す属性値の範囲を示す.
- data.pattern (regular expression pattern) 正規表現を属性値として定義する.
- data.point defines the data type used to express a point in cartesian space.
- data.pointer 他の資源へのポインタをとる属性値の範囲を定義する.
- data.version defines the range of attribute values which may be used to specify a TEI version number.
- data.word いち単語またはトークンをとる属性値の範囲を定義する.
- data.text defines the range of attribute values used to express some kind of identifying string as a single sequence of unicode characters possibly including whitespace.
- data.name XML名前としてある属性値の範囲を定義する.
- data.enumerated 符号化されている記述にある,ひつとのXML名前を示す属性値の範囲を定義する.
- data.code ポインターにより,コードの値を示す当該属性値の範囲を定義する.
Attributes of type data.word, such as age on person, are used to supply an identifier expressed as any kind of single token or word. The TEI places a few constraints on the characters which may be used for this purpose: only Unicode characters classified as letters, digits, punctuation characters, or symbols can appear in an attribute value of this kind. Note in particular that such values cannot include whitespace characters. Legal values include cholmondeley, été, 1234, e_content, or xml:id, but not grand wazoo. Attributes of this kind are sometimes used to associate (by co-reference) elements of different types.
Where identifiers are defined externally, for example as part of a database or file system,
the inability to include whitespace or other special characters in a value may be
problematic. In other cases, it may also be simply more convenient to supply a short
sequence of natural language words including spaces as a single value. For these reasons, we
also provide a datatype data.text which does permit whitespace and indeed any
other Unicode character. Legal values include cholmondeley, été,
1234, e-content, xml:id, and grand wazoo. This
datatype should be used with care since XML will not normalise whitespace characters within
it: for example the values n="a b"
(two spaces) and n="a b"
(three
spaces) would be considered distinct. This case should be distinguished from that of an
attribute permitting multiple values, each of which may be separated by whitespace which
will be normalised (see further 22.4.5.1 Datatypes).
Attributes of type data.name are similar to those of type data.word, but with the additional constraint that they must be legal XML identifiers, as defined by the XML 1.0 specification, or successors. Hence, they may not begin with digits or punctuation characters. Legal identifiers include cholmondeley, été, e_content, or xml:id, but not grand wazoo or 1234. Attributes of this kind are typically used to represent XML element or attribute names.
Attributes of type data.enumerated, such as new on shift or evidence supplied by att.editLike, have the same definition as data.word above, with the added constraint that the word supplied is taken from a specific list of possibilities. In each case, the element or class specification which includes the definition for the attribute will also contain a list of possible values, together with a prose description of their intended significance. This list may be open (in which case the list is advisory), or closed (in which case it determines the range of legal values). In this latter case, the datatype will not be data.enumerated, but an explicit list of the possible values.
Attributes of type data.code are similar in function, in that they also supply encoded names for values which are defined in more detail elsewhere. In this case, however, the full definition is supplied as content of another XML element, typically but not necessarily in the same document, and it is referenced by means of a pointer.
An attribute may, of course, take more than one value of a given type, for example a list of pointer values, or a list of words. In the TEI scheme, this information is regarded as a property of the datatype element used to document the attribute in question rather than as a distinct ‘datatype’. See further 22.4.5.1 Datatypes.
- « 1.4 Macros
- ホーム | 目次
1.5 The TEI Infrastructure Module The TEI Infrastructure Module¶
- モジュール tei: 全TEIモジュールで使用可能なデータ型,クラス,マクロ.
- 定義済みクラス: att.ascribed att.breaking att.cReferencing att.canonical att.damaged att.datable att.datable.w3c att.datcat att.declarable att.declaring att.dimensions att.divLike att.docStatus att.duration.w3c att.editLike att.global att.handFeatures att.internetMedia att.interpLike att.measurement att.naming att.personal att.placement att.pointing att.pointing.group att.ranging att.readFrom att.responsibility att.scoping att.segLike att.sortable att.sourced att.spanning att.styleDef att.tableDecoration att.timed att.transcriptional att.translatable att.typed model.addrPart model.addressLike model.applicationLike model.availabilityPart model.biblLike model.biblPart model.castItemPart model.catDescPart model.certLike model.choicePart model.common model.dateLike model.descLike model.dimLike model.div1Like model.div2Like model.div3Like model.div4Like model.div5Like model.div6Like model.div7Like model.divBottom model.divBottomPart model.divGenLike model.divLike model.divPart model.divTop model.divTopPart model.divWrapper model.editorialDeclPart model.egLike model.emphLike model.encodingDescPart model.entryPart model.entryPart.top model.featureVal model.featureVal.complex model.featureVal.single model.frontPart model.frontPart.drama model.gLike model.global model.global.edit model.global.meta model.glossLike model.graphicLike model.headLike model.hiLike model.highlighted model.imprintPart model.inter model.lLike model.lPart model.labelLike model.limitedPhrase model.linePart model.listLike model.measureLike model.milestoneLike model.msItemPart model.msQuoteLike model.nameLike model.nameLike.agent model.noteLike model.oddDecl model.oddRef model.offsetLike model.orgPart model.orgStateLike model.pLike model.pLike.front model.pPart.data model.pPart.edit model.pPart.editorial model.pPart.msdesc model.pPart.transcriptional model.persEventLike model.persStateLike model.personLike model.personPart model.phrase model.phrase.xml model.physDescPart model.placeEventLike model.placeLike model.placeNamePart model.placeStateLike model.profileDescPart model.ptrLike model.publicationStmtPart model.qLike model.quoteLike model.resourceLike model.respLike model.segLike model.settingPart model.sourceDescPart model.specDescLike model.stageLike model.teiHeaderPart model.textDescPart model.titlepagePart
- 定義済みマクロ: data.certainty data.code data.count data.duration.iso data.duration.w3c data.enumerated data.language data.name data.namespace data.numeric data.outputMeasurement data.pattern data.point data.pointer data.probability data.sex data.temporal.iso data.temporal.w3c data.text data.truthValue data.version data.word data.xTruthValue macro.limitedContent macro.paraContent macro.phraseSeq macro.phraseSeq.limited macro.specialPara macro.xtext
The order in which declarations are made within the infrastructure module is critical, since several class declarations refer to others, which must therefore precede them. Other constraints on the order of declarations derive from the way in which the modularity of the TEI scheme is implemented in different schema languages. The XML DTD fragment implementing this TEI module makes extensive use of parameter entities and marked sections to effect a kind of conditional construction; the RELAX NG schema fragment similarly predeclares a number of patterns with null (‘notAllowed’) values. These issues are further discussed in chapter 23.5 Implementation of an ODD System.