2 The TEI Header
Contenu
This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented. Such documentation is equally necessary for scholars using the texts, for software processing them, and for cataloguers in libraries and archives. Together these descriptions and declarations provide an electronic analogue to the title page attached to a printed work. They also constitute an equivalent for the content of the code books or introductory manuals customarily accompanying electronic data sets.
- a file description, tagged fileDesc, containing a full bibliographical description of the computer file itself, from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive. The term computer file here is to be understood as referring to the whole entity or document described by the header, even when this is stored in several distinct operating system files. The file description also includes information about the source or sources from which the electronic document was derived. The TEI elements used to encode the file description are described in section 2.2 The File Description below.
- an encoding description, tagged encodingDesc, which describes the relationship between an electronic text and its source or sources. It allows for detailed description of whether (or how) the text was normalized during transcription, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied, and similar matters. The TEI elements used to encode the encoding description are described in section 2.3 The Encoding Description below.
- a text profile, tagged profileDesc, containing classificatory and contextual information about the text, such as its subject matter, the situation in which it was produced, the individuals described by or participating in producing it, and so forth. Such a text profile is of particular use in highly structured composite texts such as corpora or language collections, where it is often highly desirable to enforce a controlled descriptive vocabulary or to perform retrievals from a body of text in terms of text type or origin. The text profile may however be of use in any form of automatic text processing. The TEI elements used to encode the profile description are described in section 2.4 The Profile Description below.
- a revision history, tagged revisionDesc, which allows the encoder to provide a history of changes made during the development of the electronic text. The revision history is important for version control and for resolving questions about the history of a file. The TEI elements used to encode the revision description are described in section 2.5 The Revision Description below.
A TEI header can be a very large and complex object, or it may be a very simple one. Some application areas (for example, the construction of language corpora and the transcription of spoken texts) may require more specialized and detailed information than others. The present proposals therefore define both a core set of elements (all of which may be used without formality in any TEI header) and some additional elements which become available within the header as the result of including additional specialized modules within the schema. When the module for language corpora (described in chapter 15 Language Corpora) is in use, for example, several additional elements are available, as further detailed in that chapter.
The next section of the present chapter briefly introduces the overall structure of the header and the kinds of data it may contain. This is followed by a detailed description of all the constituent elements which may be used in the core header. Section 2.6 Minimal and Recommended Headers , at the end of the present chapter, discusses the recommended content of a minimal TEI header and its relation to standard library cataloguing practices.
2.1 Organization of the TEI HeaderTEI: Organization of the TEI Header¶
2.1.1 The TEI Header and its ComponentsTEI: The TEI Header and its Components¶
The teiHeader element should be clearly distinguished from the front matter of the text itself (for which see section 4.5 Front Matter). A composite text, such as a corpus or collection, may contain several headers, as further discussed below. In the general case, however, a TEI-conformant text will contain a single teiHeader element, followed by a single text or facsimile element, or both.
- teiHeader (en-tête TEI) donne des informations descriptives et déclaratives qui
constituent une page de titre électronique au début de tout texte conforme à la TEI.
type spécifie le type de document auquel l'en-tête TEI se rapporte.
- fileDesc (description bibliographique du fichier.) contient une description bibliographique complète du fichier électronique.
- encodingDesc (description de l'encodage) documente la relation d'un texte électronique avec sa ou ses sources.
- profileDesc (description du profil) fournit une description détaillée des aspects non-bibliographiques du texte, notamment les langues utilisées et leurs variantes, les circonstances de sa production, les collaborateurs et leur statut.
- revisionDesc (descriptif des révisions) fournit un résumé de l’historique des révisions d’un fichier.
<fileDesc>
<titleStmt>
<title>
<!-- title of the resource ... -->
</title>
</titleStmt>
<publicationStmt>
<p>(Information about distribution of the
resource)</p>
</publicationStmt>
<sourceDesc>
<p>(Information about source from which the resource derives)</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<teiHeader xml:lang="fr">
<!-- ... -->
</teiHeader>
<text xml:lang="en">
<!-- ... -->
</text>
</TEI>
<teiHeader type="corpus">
<!-- corpus-level metadata here -->
</teiHeader>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader type="text">
<!-- metadata specific to this text here -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader type="text">
<!-- metadata specific to this text here -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
</teiCorpus>
2.1.2 Types of Content in the TEI HeaderTEI: Types of Content in the TEI Header¶
- free prose
- Most elements contain simple running prose at some level. Many elements may contain either prose (possibly organized into paragraphs) or more specific elements, which themselves contain prose. In this chapter's descriptions of element content, the phrase prose description should be understood to imply a series of paragraphs, each marked as a p element. The word phrase, by contrast, should be understood to imply character data, interspersed as need be with phrase-level elements, but not organized into paragraphs. For more information on paragraphs, highlighted phrases, lists, etc., see section 3.1 Paragraphs.
- grouping elements
- Elements whose names end with the suffix Stmt (e.g. editionStmt, titleStmt) usually enclose a group of specialized elements recording some structured information. In the case of the bibliographic elements, the suffix Stmt is used in names of elements corresponding to the ‘areas’ of the International Standard Bibliographic Description.4 In most cases grouping elements may contain prose descriptions as an alternative to the set of specialized elements, thus allowing the encoder to choose whether or not the information concerned should be presented in a structured form or in prose.
- declarations
- Elements whose names end with the suffix Decl (e.g. tagsDecl, refsDecl) enclose information about specific encoding practices applied in the electronic text; often these practices are described in coded form. Typically, such information takes the form of a series of declarations, identifying a code with some more complex structure or description. A declaration which applies to more than one text or division of a text need not be repeated in the header of each such text or subdivision. Instead, the decls attribute of each text (or subdivision of the text) to which the declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
- descriptions
- Elements whose names end with the suffix Desc (e.g. settingDesc, projectDesc) contain a prose description, possibly, but not necessarily, organized under some specific headings by suggested sub-elements.
2.1.3 Model Classes in the TEI HeaderTEI: Model Classes in the TEI Header¶
The TEI Header provides a very rich collection of metadata categories, but makes no claim to be exhaustive. It is certainly the case that individual projects may wish to record specialized metadata which either does not fit within one of the predefined categories identified by the TEI Header or requires a more specialized element structure than is proposed here. To overcome this problem, the encoder may elect to define additional elements using the customization methods discussed in 23.2 Personalization and Customization. The TEI class system makes such customizations simpler to effect and easier to use in interchange.
- model.applicationLike regroupe des éléments utilisés pour enregistrer dans l'en-tête TEI d'un document des informations d'applications spécifiques.
- model.availabilityPart groups elements such as licences and paragraphs of text which may appear as part of an availability statment
- model.catDescPart regroupe des éléments composants de l'élément catDesc dans l'en-tête TEI.
- model.editorialDeclPart regroupe des éléments qui peuvent être employés dans l'élément editorialDecl et souvent à de nombreuses reprises.
- model.encodingDescPart regroupe des éléments qui peuvent être employés dans l'élément encodingDesc et souvent à de nombreuses reprises.
- model.profileDescPart regroupe des éléments que l'on peut utiliser plusieurs fois dans l'élément profileDesc.
- model.teiHeaderPart regroupe des éléments de macrostructure qui peuvent apparaître plus d'une fois dans l’en-tête TEI.
- model.sourceDescPart regroupe des éléments que l'on peut utiliser plusieurs fois dans l'élément sourceDesc.
- model.textDescPart regroupe des éléments destinés à attribuer une catégorie à un texte, par exemple en utilisant des termes qui caractérisent sa situation.
2.2 The File DescriptionTEI: The File Description¶
This section describes the fileDesc element, which is the first component of the teiHeader element.
The bibliographic description of a machine-readable or digital text resembles in structure that of a book, an article, or any other kind of textual object. The file description element of the TEI header has therefore been closely modelled on existing standards in library cataloguing; it should thus provide enough information to allow users to give standard bibliographic references to the electronic text, and to allow cataloguers to catalogue it. Bibliographic citations occurring elsewhere in the header, and also in the text itself, are derived from the same model (on bibliographic citations in general, see further section 3.11 Bibliographic Citations and References). See further section 2.7 Note for Library Cataloguers.
- fileDesc (description bibliographique du fichier.) contient une description bibliographique complète du fichier électronique.
- titleStmt (mention de titre) regroupe les informations sur le titre d’une œuvre et les personnes ou institutions responsables de son contenu intellectuel.
- editionStmt (mention d'édition) regroupe les informations relatives à l’édition d’un texte.
- extent (étendue) décrit la taille approximative d’un texte stocké sur son support, numérique ou non-numérique, exprimé dans une unité quelconque appropriée.
- publicationStmt (mention de publication) regroupe des informations concernant la publication ou la diffusion d’un texte électronique ou d’un autre type de texte.
- seriesStmt (mention de collection) regroupe toute information relative à la collection (si elle existe) à laquelle appartient une publication.
- notesStmt (mention de notes) rassemble toutes les notes fournissant des informations sur un texte, en plus des informations mentionnées dans d'autres parties de la description bibliographique.
- sourceDesc (description de la source) décrit la source à partir de laquelle un texte électronique a été dérivé ou produit, habituellement une description bibliographique pour un texte numérisé, ou une expression comme "document numérique natif " pour un texte qui n'a aucune existence précédente.
<fileDesc>
<titleStmt>
<title>
<!-- title of the resource -->
</title>
</titleStmt>
<editionStmt>
<p>
<!-- information about the edition of the resource -->
</p>
</editionStmt>
<extent>
<!-- description of the size of the resource -->
</extent>
<publicationStmt>
<p>
<!-- information about the distribution of the resource -->
</p>
</publicationStmt>
<seriesStmt>
<p>
<!-- information about any series to which the resource belongs -->
</p>
</seriesStmt>
<notesStmt>
<note>
<!-- notes on other aspects of the resource -->
</note>
</notesStmt>
<sourceDesc>
<p>
<!-- information about the source from which the resource was derived -->
</p>
</sourceDesc>
</fileDesc>
</teiHeader>
2.2.1 The Title StatementTEI: The Title Statement¶
- titleStmt (mention de titre) regroupe les informations sur le titre d’une œuvre et les personnes ou institutions responsables de son contenu intellectuel.
- title (titre) contient le titre complet d'une oeuvre quelconque
- author (auteur) dans une référence bibliographique contient le nom de la (des) personne(s) physique(s) ou du collectif, auteur(s) d'une oeuvre ; la première mention de responsabilité comme seul élément bibliographique.
- editor mention de responsabilité secondaire pour un item bibliographique, par exemple le nom d'une personne, d'une institution ou d'un organisme (ou de plusieurs d'entre eux) comme éditeur scientifique, compilateur, traducteur, etc.
- sponsor (commanditaire ) indique le nom d’une institution ou d’un organisme partenaires.
- funder (financeur) désigne le nom d’une personne ou d’un organisme responsable du financement d’un projet ou d’un texte.
- principal (chercheur principal) donne le nom du chercheur qui est principalement responsable de la création d’un texte électronique.
- respStmt (mention de responsabilité) donne une mention de responsabilité quant au contenu intellectuel d'un texte, d'une édition, d'un enregistrement ou d'une publication en série, lorsque les éléments spécifiques relatifs aux auteurs, éditeurs, etc. ne suffisent pas ou ne s'appliquent pas.
- resp (responsabilité) contient une expression décrivant la nature de la responsabilité intellectuelle d'une personne.
- name (nom, nom propre) contient un nom propre ou un syntagme nominal
The title element contains the chief name of the electronic work, including any alternative title or subtitles it may have. It may be repeated, if the work has more than one title (perhaps in different languages) and takes whatever form is considered appropriate by its creator. Where the electronic work is derived from an existing source text, it is strongly recommended that the title for the former should be derived from the latter, but clearly distinguishable from it, for example by the addition of a phrase such as ‘: an electronic transcription’ or ‘a digital edition’. This will distinguish the electronic work from the source text in citations and in catalogues which contain descriptions of both types of material.
The electronic work will also have an external name (its ‘filename’ or ‘data set name’) or reference number on the computer system where it resides at any time. This name is likely to change frequently, as new copies of the file are made on the computer system. Its form is entirely dependent on the particular computer system in use and thus cannot always easily be transferred from one system to another. Moreover, a given work may be composed of many files. For these reasons, these Guidelines strongly recommend that such names should not be used as the title for any electronic work.
Helpful guidance on the formulation of useful descriptive titles in difficult cases may be found in the Anglo-American Cataloguing Rules (Gorman and Winkler, 1978, chapter 25) or in equivalent national-level bibliographical documentation.
The elements author, editor, sponsor, funder, and principal, are specializations of the more general respStmt element. These elements are used to provide the statements of responsibility which identify the person(s) responsible for the intellectual or artistic content of an item and any corporate bodies from which it emanates.
Any number of such statements may occur within the title statement. At a minimum, identify the author of the text and (where appropriate) the creator of the file. If the bibliographic description is for a corpus, identify the creator of the corpus. Optionally include also names of others involved in the transcription or elaboration of the text, sponsors, and funding agencies. The name of the person responsible for physical data input need not normally be recorded, unless that person is also intellectually responsible for some aspect of the creation of the file.
Where the person whose responsibility is to be documented is not an author, sponsor, funding body, or principal researcher, the respStmt element should be used. This has two subcomponents: a name element identifying a responsible individual or organization, and a resp element indicating the nature of the responsibility. No specific recommendations are made at this time as to appropriate content for the resp: it should make clear the nature of the responsibility concerned, as in the examples below.
Names given may be personal names or corporate names. Give all names in the form in which the persons or bodies wish to be publicly cited. This would usually be the fullest form of the name, including first names.5
<title>Capgrave's Life of St. John Norbert: a
machine-readable transcription</title>
<respStmt>
<resp>compiled by</resp>
<name>P.J. Lucas</name>
</respStmt>
</titleStmt>
<title>Two stories by Edgar Allen Poe: electronic version</title>
<author>Poe, Edgar Allen (1809-1849)</author>
<respStmt>
<resp>compiled by</resp>
<name>James D. Benson</name>
</respStmt>
</titleStmt>
<title>Yogadarśanam (arthāt
yogasūtrapūṭhaḥ):
a digital edition.</title>
<title>The Yogasūtras of Patañjali:
a digital edition.</title>
<funder>Wellcome Institute for the History of Medicine</funder>
<principal>Dominik Wujastyk</principal>
<respStmt>
<name>Wieslaw Mical</name>
<resp>data entry and proof correction</resp>
</respStmt>
<respStmt>
<name>Jan Hajic</name>
<resp>conversion to TEI-conformant markup</resp>
</respStmt>
</titleStmt>
2.2.2 The Edition StatementTEI: The Edition Statement¶
- editionStmt (mention d'édition) regroupe les informations relatives à l’édition d’un texte.
- edition (édition) décrit les particularités de l’édition d’un texte.
- respStmt (mention de responsabilité) donne une mention de responsabilité quant au contenu intellectuel d'un texte, d'une édition, d'un enregistrement ou d'une publication en série, lorsque les éléments spécifiques relatifs aux auteurs, éditeurs, etc. ne suffisent pas ou ne s'appliquent pas.
- name (nom, nom propre) contient un nom propre ou un syntagme nominal
- resp (responsabilité) contient une expression décrivant la nature de la responsabilité intellectuelle d'une personne.
For printed texts, the word edition applies to the set of all the identical copies of an item produced from one master copy and issued by a particular publishing agency or a group of such agencies. A change in the identity of the distributing body or bodies does not normally constitute a change of edition, while a change in the master copy does.
For electronic texts, the notion of a ‘master copy’ is not entirely appropriate, since they are far more easily copied and modified than printed ones; nonetheless the term edition may be used for a particular state of a machine-readable text at which substantive changes are made and fixed. Synonymous terms used in these Guidelines are version, level, and release. The words revision and update, by contrast, are used for minor changes to a file which do not amount to a new edition.
No simple rule can specify how ‘substantive’ changes have to be before they are regarded as producing a new edition, rather than a simple update. The general principle proposed here is that the production of a new edition entails a significant change in the intellectual content of the file, rather than its encoding or appearance. The addition of analytic coding to a text would thus constitute a new edition, while automatic conversion from one coded representation to another would not. Changes relating to the character code or physical storage details, corrections of misspellings, simple changes in the arrangement of the contents and changes in the output format do not normally constitute a new edition, whereas the addition of new information (e.g. a linguistic analysis expressed in part-of-speech tagging, sound or graphics, referential links to external data sets) almost always does.
Clearly, there will always be borderline cases and the matter is somewhat arbitrary. The simplest rule is: if you think that your file is a new edition, then call it such. An edition statement is optional for the first release of a computer file; it is mandatory for each later release, though this requirement cannot be enforced by the parser.
Note that all changes in a file, whether or not they are regarded as constituting a new edition or simply a new revision, should be independently noted in the revision description section of the file header (see section 2.5 The Revision Description).
The edition element should contain phrases describing the edition or version, including the word edition, version, or equivalent, together with a number or date, or terms indicating difference from other editions such as new edition, revised edition etc. Any dates that occur within the edition statement should be marked with the date element. The n attribute of the edition element may be used as elsewhere to supply any formal identification (such as a version number) for the edition.
One or more respStmt elements may also be used to supply statements of responsibility for the edition in question. These may refer to individuals or corporate bodies and can indicate functions such as that of a reviser, or can name the person or body responsible for the provision of supplementary matter, of appendices, etc., in a new edition. For further detail on the respStmt element, see section 3.11 Bibliographic Citations and References.
<edition n="P2">Second draft, substantially
extended, revised, and corrected.</edition>
</editionStmt>
<edition>Student's edition, <date>June 1987</date>
</edition>
<respStmt>
<resp>New annotations by</resp>
<name>George Brown</name>
</respStmt>
</editionStmt>
2.2.3 Type and Extent of FileTEI: Type and Extent of File¶
- extent (étendue) décrit la taille approximative d’un texte stocké sur son support, numérique ou non-numérique, exprimé dans une unité quelconque appropriée.
For printed books, information about the carrier, such as the kind of medium used and its size, are of great importance in cataloguing procedures. The print-oriented rules for bibliographic description of an item's medium and extent need some re-interpretation when applied to electronic media. An electronic file exists as a distinct entity quite independently of its carrier and remains the same intellectual object whether it is stored on a magnetic tape, a CD-ROM, a set of floppy disks, or as a file on a mainframe computer. Since, moreover, these Guidelines are specifically aimed at facilitating transparent document storage and interchange, any purely machine-dependent information should be irrelevant as far as the file header is concerned.
This is particularly true of information about file-type although library-oriented rules for cataloguing often distinguish two types of computer file: ‘data’ and ‘programs’. This distinction is quite difficult to draw in some cases, for example, hypermedia or texts with built in search and retrieval software.
- in bytes of a specified length (e.g. ‘4000 16-bit bytes’)
- as falling within a range of categories, for example:
- less than 1 Mb
- between 1 Mb and 5 Mb
- between 6 Mb and 10 Mb
- over 10 Mb
- in terms of any convenient logical units (for example, words or sentences, citations, paragraphs)
- in terms of any convenient physical units (for example, blocks, disks, tapes)
The use of standard abbreviations for units of quantity is recommended where applicable, here as elsewhere (see http://physics.nist.gov/cuu/Units/binary.html).
<extent>4.2 MiB</extent>
<extent>4532 bytes</extent>
<extent>3200 sentences</extent>
<extent>5 90 mm High Density Diskettes</extent>
2.2.4 Publication, Distribution, Licencing, etc.TEI: Publication, Distribution, Licencing, etc.¶
- publicationStmt (mention de publication) regroupe des informations concernant la publication ou la diffusion d’un texte électronique ou d’un autre type de texte.
- publisher (éditeur) donne le nom de l'organisme responsable de la publication ou de la distribution d'un élément bibliographique.
- distributor (Diffuseur) donne le nom d’une personne ou d’un organisme responsable de la diffusion d’un texte.
- authority (responsable de la publication.) donne le nom de la personne ou de l'organisme responsable de la publication d’un fichier électronique, autre qu’un éditeur ou un distributeur.
The publisher is the person or institution by whose authority a given edition of the file is made public. The distributor is the person or institution from whom copies of the text may be obtained. Where a text is not considered formally published, but is nevertheless made available for circulation by some individual or organization, this person or institution is termed the release authority.
- pubPlace (lieu de publication) contient le nom du lieu de publication dans une référence bibliographique.
- address contient une adresse postale ou d'un autre type, par exemple l'adresse d'un éditeur, d'un organisme ou d'une personne.
- idno (identifiant) donne un numéro normalisé ou non qui peut être utilisé
pour identifier une référence bibliographique.
type classe un numéro dans une catégorie, par exemple comme étant un numéro ISBN ou comme appartenant une autre série normalisée. - availability (disponibilité) renseigne sur la disponibilité du texte, par exemple sur
toutes restrictions quant à son usage ou sa diffusion, son statut de copyright, etc.
status donne un code caractérisant la disponibilité actuelle d’un texte. - date (date) contient une date exprimée dans n'importe quel format.
Note that the dates, places, etc., given in the publication statement relate to the publisher, distributor, or release authority most recently mentioned. If the text was created at some date other than its date of publication, its date of creation should be given within the profileDesc element, not in the publication statement. Give any other useful dates (e.g., dates of collection of data) in a note.
Additional detailed elements may be used for the encoding of names, dates, and addresses, as further described in section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses when the module described in chapter 13 Names, Dates, People, and Places is included in a schema.
Where the work is covered by a formal licence (such as a Creative Commons licence), the licence element may be included in availability, with its target attribute pointing to an instance of the licence document. The content of the licence element may include the full text of the licence, or a brief statement of its applicability.
<publisher>Oxford University Press</publisher>
<pubPlace>Oxford</pubPlace>
<date>1989</date>
<idno type="ISBN">0-19-254705-4</idno>
<availability>
<p>Copyright 1989, Oxford University Press</p>
</availability>
</publicationStmt>
<authority>James D. Benson</authority>
<pubPlace>London</pubPlace>
<date>1984</date>
</publicationStmt>
<publisher>Sigma Press</publisher>
<address>
<addrLine>21 High Street,</addrLine>
<addrLine>Wilmslow,</addrLine>
<addrLine>Cheshire M24 3DF</addrLine>
</address>
<date>1991</date>
<distributor>Oxford Text Archive</distributor>
<idno type="OTA">1256</idno>
<availability>
<p>Available with prior consent of depositor for
purposes of academic research and teaching only.</p>
</availability>
</publicationStmt>
<publisher>University of Victoria Humanities Computing and Media Centre</publisher>
<pubPlace>Victoria, BC</pubPlace>
<date>2011</date>
<availability status="restricted">
<licence
target="http://creativecommons.org/licenses/by-sa/3.0/"> Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License
</licence>
</availability>
</publicationStmt>
2.2.5 The Series StatementTEI: The Series Statement¶
- seriesStmt (mention de collection) regroupe toute information relative à la collection (si elle existe) à laquelle appartient une publication.
- A group of separate items related to one another by the fact that each item bears, in addition to its own title proper, a collective title applying to the group as a whole. The individual items may or may not be numbered.
- Each of two or more volumes of essays, lectures, articles, or other items, similar in character and issued in sequence.
- A separately numbered sequence of volumes within a series or serial.
- title (titre) contient le titre complet d'une oeuvre quelconque
- idno (identifiant) donne un numéro normalisé ou non qui peut être utilisé pour identifier une référence bibliographique.
- respStmt (mention de responsabilité) donne une mention de responsabilité quant au contenu intellectuel d'un texte, d'une édition, d'un enregistrement ou d'une publication en série, lorsque les éléments spécifiques relatifs aux auteurs, éditeurs, etc. ne suffisent pas ou ne s'appliquent pas.
- resp (responsabilité) contient une expression décrivant la nature de la responsabilité intellectuelle d'une personne.
- name (nom, nom propre) contient un nom propre ou un syntagme nominal
The idno may be used to supply any identifying number associated with the item, including both standard numbers such as an ISSN and particular issue numbers. (Arabic numerals separated by punctuation are recommended for this purpose: 6.19.33, for example, rather than VI/xix:33). Its type attribute is used to categorize the number further, taking the value ISSN for an ISSN for example.
<title level="s">Machine-Readable Texts for the Study of
Indian Literature</title>
<respStmt>
<resp>ed. by</resp>
<name>Jan Gonda</name>
</respStmt>
<biblScope type="vol">1.2</biblScope>
<idno type="ISSN">0 345 6789</idno>
</seriesStmt>
2.2.6 The Notes StatementTEI: The Notes Statement¶
- the nature, scope, artistic form, or purpose of the file; also the genre or other intellectual category to which it may belong: e.g. ‘Text types: newspaper editorials and reportage, science fiction, westerns, and detective stories’. These should be formally described within the profileDesc element (section 2.4 The Profile Description).
- summary description providing a factual, non-evaluative account of the subject content of the file: e.g. ‘Transcribes interviews on general topics with native speakers of English in 17 cities during the spring and summer of 1963.’ These should also be formally described within the profileDesc element (section 2.4 The Profile Description).
- bibliographic details relating to the source or sources of an electronic text: e.g. ‘Transcribed from the Norton facsimile of the 1623 Folio’. These should be formally described in the sourceDesc element (section 2.2.7 The Source Description).
- further information relating to publication, distribution, or release of the text, including sources from which the text may be obtained, any restrictions on its use or formal terms on its availability. These should be placed in the appropriate division of the publicationStmt element (section 2.2.4 Publication, Distribution, Licencing, etc.).
- publicly documented numbers associated with the file: e.g. ‘ICPSR study number 1803’ or ‘Oxford Text Archive text number 1243’. These should be placed in an idno element within the appropriate division of the publicationStmt element. International Standard Serial Numbers (ISSN), International Standard Book Numbers (ISBN), and other internationally agreed upon standard numbers that uniquely identify an item, should be treated in the same way, rather than as specialized bibliographic notes.
- dates, when they are relevant to the content or condition of the computer file: e.g. ‘manual dated 1983’, ‘Interview wave I: Apr. 1989; wave II: Jan. 1990’
- names of persons or bodies connected with the technical production, administration, or consulting functions of the effort which produced the file, if these are not named in statements of responsibility in the title or edition statements of the file description: e.g. ‘Historical commentary provided by Mark Cohen’
- availability of the file in an additional medium or information not already recorded about the availability of documentation: e.g. ‘User manual is loose-leaf in eleven paginated sections’
- language of work and abstract, if not encoded in the langUsage element, e.g. ‘Text in English with summaries in French and German’
- The unique name assigned to a serial by the International Serials Data System (ISDS), if not encoded in an idno
- lists of related publications, either describing the source itself, or concerned with the creation or use of the electronic work, e.g. ‘Texts used in Burrows (1987)’
<note>Historical commentary provided by Mark Cohen.</note>
<note>OCR scanning done at University of Toronto.</note>
</notesStmt>
<title>…</title>
<respStmt>
<persName>Mark Cohen</persName>
<resp>historical commentary</resp>
</respStmt>
<respStmt>
<orgName>University of Toronto</orgName>
<resp>OCR scanning</resp>
</respStmt>
</titleStmt>
2.2.7 The Source DescriptionTEI: The Source Description¶
- sourceDesc (description de la source) décrit la source à partir de laquelle un texte électronique a été dérivé ou produit, habituellement une description bibliographique pour un texte numérisé, ou une expression comme "document numérique natif " pour un texte qui n'a aucune existence précédente.
<p>Born digital.</p>
</sourceDesc>
- model.biblLike regroupe des éléments contenant une description bibliographique.
- model.sourceDescPart regroupe des éléments que l'on peut utiliser plusieurs fois dans l'élément sourceDesc.
- model.listLike regroupe les éléments de type liste.
- bibl (référence bibliographique.) contient une référence bibliographique faiblement structurée dans laquelle les sous-composants peuvent ou non être explicitement balisés.
- biblStruct (référence bibliographique structurée) contient une référence bibliographique dans laquelle seuls des sous-éléments bibliographiques apparaissent et cela, selon un ordre déterminé.
- listBibl (liste de références bibliographiques) contient une liste de références bibliographiques de toute nature.
<bibl>The first folio of Shakespeare, prepared by
Charlton Hinman (The Norton Facsimile, 1968)</bibl>
</sourceDesc>
<biblStruct xml:lang="fr">
<monogr>
<author>Eugène Sue</author>
<title>Martin, l'enfant trouvé</title>
<title type="sub">Mémoires d'un valet de chambre</title>
<imprint>
<pubPlace>Bruxelles et Leipzig</pubPlace>
<publisher>C. Muquardt</publisher>
<date when="1846">1846</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
- biblFull (référence bibliographique totalement structurée) contient une référence bibliographique totalement structurée : tous les composants de la description du fichier TEI y sont présents.
- msDesc (description d'un manuscrit) contient la description d'un manuscrit bien individualisé
- scriptStmt (déclaration du script) contient une citation donnant des détails sur le script à l’origine de la parole. [le terme ‘script’ est entendu au sens large dans ce document comme tout texte préparatoire à une prise de parole (discours politique, sermon, interview, allocution, conférence, émission, etc.)].
- recordingStmt (déclaration d'enregistrements) décrit un ensemble d’enregistrements utilisés pour la transcription de la parole.
A single electronic text may be derived from multiple source documents, in whole or in part. The sourceDesc may therefore contain a listBibl element grouping together bibl, biblStruct, or msDesc elements for each of the sources concerned. It is also possible to repeat the sourceDesc element in such a case. The decls attribute described in section 15.3 Associating Contextual Information with a Text may be used to associate parts of the encoded text with the bibliographic element from which it derives in either case.
- listNym (liste de noms canoniques) contient une liste de noms normalisés pour tous types d'objets.
- listOrg (liste d'organisations) contient une liste d'éléments, chacun d'eux fournissant des informations sur une organisation identifiable.
- listPerson (liste de personnes) contient une liste d'éléments, chacun d'entre eux apportant des informations sur une personne précise ou sur un groupe de personnes, par exemple les participants à une interaction linguistique, ou les personnes citées dans une source historique.
- listPlace (liste de lieux) contient une liste de lieux, qui peut être suivie d'une liste de relations définies entre les lieux (autres que la relation d'inclusion).
2.2.8 Computer Files Derived from Other Computer Files TEI: Computer Files Derived from Other Computer Files ¶
- fileDesc
- A's file description should be copied into the sourceDesc section of B's file description, enclosed within a biblFull element
- profileDesc
- A's profileDesc should be copied into B's, in principle unchanged; it may however be expanded by project-specific information relating to B.
- encodingDesc
- A's encoding practice may or (more likely) may not be the same as B's. Since the object of the encoding description is to define the relationship between the current file and its source, in principle only changes in encoding practice between A and B need be documented in B. The relationship between A and its source(s) is then only recoverable from the original header of A. In practice it may be more convenient to create a new complete encodingDesc for B based on A's.
- revisionDesc
- B is a new computer file, and should therefore have a new revision description. If, however, it is felt useful to include some information from A's revisionDesc, for example dates of major updates or versions, such information must be clearly marked as relating to A rather than to B.
2.3 The Encoding DescriptionTEI: The Encoding Description¶
- encodingDesc (description de l'encodage) documente la relation d'un texte électronique avec sa ou ses sources.
- projectDesc (description du projet) décrit en détail le but ou l’objectif visé dans l’encodage d’un fichier électronique, ainsi que toute autre information pertinente sur la manière dont il a été construit ou recueilli.
- samplingDecl (déclaration d'échantillonnage) contient une description en texte libre du raisonnement et des méthodes utilisés pour l'échantillonnage des textes dans la création d’un corpus ou d’une collection.
- editorialDecl (déclaration des pratiques éditoriales) donne des précisions sur les pratiques et les principes éditoriaux appliqués au cours de l’encodage du texte.
- tagsDecl (déclaration de balisage) donne l’information détaillée sur le balisage appliqué à un document .
- refsDecl (Déclaration du système de références) précise la manière dont les références canoniques ont été construites pour ce texte.
- classDecl (déclaration de classification) contient une ou plusieurs taxinomies définissant les codes de classification utilisés n’importe où dans le texte.
- schemaSpec (spécification de schéma) génère un schéma conforme à la TEI et la documentation qui l'accompagne.
- appInfo (informations d'application) enregistre des informations sur l'application qui a été utilisée pour traiter le fichier TEI.
2.3.1 The Project DescriptionTEI: The Project Description¶
- projectDesc (description du projet) décrit en détail le but ou l’objectif visé dans l’encodage d’un fichier électronique, ainsi que toute autre information pertinente sur la manière dont il a été construit ou recueilli.
<projectDesc>
<p>Texts collected for use in the
Claremont Shakespeare Clinic, June 1990.</p>
</projectDesc>
</encodingDesc>
2.3.2 The Sampling DeclarationTEI: The Sampling Declaration¶
- samplingDecl (déclaration d'échantillonnage) contient une description en texte libre du raisonnement et des méthodes utilisés pour l'échantillonnage des textes dans la création d’un corpus ou d’une collection.
- the size of individual samples
- the method or methods by which they were selected
- the underlying population being sampled
- the object of the sampling procedure used
<p>Samples of 2000 words taken from the beginning of the text.</p>
</samplingDecl>
<p>Text of stories only has been transcribed. Pull quotes, captions,
and advertisements have been silently omitted. Any mathematical
expressions requiring symbols not present in the ISOnum or ISOpub
entity sets have been omitted, and their place marked with a GAP
element.</p>
</samplingDecl>
A sampling declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which the sampling declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
2.3.3 The Editorial Practices DeclarationTEI: The Editorial Practices Declaration¶
- editorialDecl (déclaration des pratiques éditoriales) donne des précisions sur les pratiques et les principes éditoriaux appliqués au cours de l’encodage du texte.
- correction
-
- correction (règles de correction) établit comment et dans quelles circonstances des
corrections ont été apportées au texte.
status indique le degré de correction apporté au texte. method indique la méthode adoptée pour signaler les corrections dans le texte.
Was the text corrected during or after data capture? If so, were corrections made silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What principles have been adopted with respect to omissions, truncations, dubious corrections, alternate readings, false starts, repetitions, etc.?
- correction (règles de correction) établit comment et dans quelles circonstances des
corrections ont été apportées au texte.
- normalization
-
- normalization (normalisation) indique l'extension de la normalisation ou de la
régularisation effectuée sur le texte source dans sa conversion vers sa forme électronique.
source indique l’autorité pour toute normalisation effectuée. method indique la méthode adoptée pour signaler les normalisations dans le texte.
Was the text normalized, for example by regularizing any non-standard spellings, dialect forms, etc.? If so, were normalizations performed silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What authority was used for the regularization? Also, what principles were used when normalizing numbers to provide the standard values for the value attribute described in section 3.5.3 Numbers and Measures and what format used for them?
- normalization (normalisation) indique l'extension de la normalisation ou de la
régularisation effectuée sur le texte source dans sa conversion vers sa forme électronique.
- quotation
-
- quotation (citation) décrit la pratique éditoriale adoptée par rapport aux
guillements dans l’original.
marks (guillemets) indique si les guillemets ont été retenus ou non comme faisant partie du texte. form précise comment les guillemets sont indiqués dans le texte.
How were quotation marks processed? Are apostrophes and quotation marks distinguished? How? Are quotation marks retained as content in the text or replaced by markup? Are there any special conventions regarding for example the use of single or double quotation marks when nested? Is the file consistent in its practice or has this not been checked?
- quotation (citation) décrit la pratique éditoriale adoptée par rapport aux
guillements dans l’original.
- hyphenation
-
- hyphenation (césurage) résume la façon dont les traits d'union sécants de fin de
ligne d’un texte source ont été traités dans sa version encodée.
eol (fin de ligne) indique si des traits d'union sécants de fin de ligne ont été conservés ou non dans un texte.
Does the encoding distinguish ‘soft’ and ‘hard’ hyphens? What principle has been adopted with respect to end-of-line hyphenation where source lineation has not been retained? Have soft hyphens been silently removed, and if so what is the effect on lineation and pagination?
- hyphenation (césurage) résume la façon dont les traits d'union sécants de fin de
ligne d’un texte source ont été traités dans sa version encodée.
- segmentation
-
- segmentation (segmentation) décrit les principes selon lesquels le texte a été segmenté, par exemple en phrases, en intonèmes (unités tonales), en strates graphématiques (niveaux superposés de signes graphiques), etc.
How is the text segmented? If s or seg segmentation units have been used to divide up the text for analysis, how are they marked and how was the segmentation arrived at?
- stdVals
-
- stdVals (valeurs normalisées) précise le format utilisé pour exprimer une date ou une valeur numérique de manière normalisée .
In most cases, attributes bearing standardized values (such as the when or when-iso attribute on dates) should conform to a defined W3C or ISO datatype. In cases where this is not appropriate, this element may be used to describe the standardization methods underlying the values supplied.
- interpretation
-
- interpretation (Interprétation) décrit le champ d’application de toute information analytique ou interprétative ajoutée à la transcription du texte.
Has any analytic or ‘interpretive’ information been provided—that is, information which is felt to be non-obvious, or potentially contentious? If so, how was it generated? How was it encoded? If feature-structure analysis has been used, are fsdDecl elements (section 18.11 Feature System Declaration) present?
<segmentation>
<p>
<gi>s</gi> elements mark orthographic sentences and
are numbered sequentially
within their parent <gi>div</gi> element
</p>
</segmentation>
<interpretation>
<p>The part of speech analysis applied throughout section 4 was
added by hand and has not been checked.</p>
</interpretation>
<correction>
<p>Errors in transcription controlled by using the
WordPerfect spelling checker.</p>
</correction>
<normalization source="http://szotar.sztaki.hu/webster/">
<p>All words converted to Modern American spelling following
Websters 9th Collegiate dictionary.</p>
</normalization>
<quotation marks="all">
<p>All opening quotation marks represented by entity reference
<ident type="ge">odq</ident>; all closing quotation marks
represented by entity reference <ident type="ge">cdq</ident>.</p>
</quotation>
</editorialDecl>
An editorial practices declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which it applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
2.3.4 The Tagging DeclarationTEI: The Tagging Declaration¶
- the namespace to which elements appearing within the transcribed text belong.
- how often particular elements appear within the text, so that a recipient can validate the integrity of a text during interchange.
- any comment relating to the usage of particular elements not specified elsewhere in the header.
- a default rendition applicable to all instances of an element.
- rendition (rendu) donne des informations sur le rendu ou sur l'apparence
d'un ou de plusieurs éléments dans le texte source.
scheme identifie la langue employée pour décrire le rendu - namespace (espace de noms) fournit le nom formel de l'espace de noms auquel appartiennent les éléments documentés par ses éléments fils.
- tagUsage (usage des balises.) donne des informations sur l’utilisation d’un élément spécifique dans un texte.
The tagsDecl element consists of an optional sequence of rendition elements, each of which must bear a unique identifier, followed by an optional sequence of one or more namespace elements, each of which contains a series of tagUsage elements, one for each distinct element from that namespace occurring within the outermost text element of a TEI document. Note that these tagUsage elements must be nested within a namespace element, and cannot appear directly within the tagsDecl element.
2.3.4.1 RenditionTEI: Rendition¶
- using an informal prose description
- using a standard stylesheet language such as CSS or XSL-FO
- using a project-defined formal language
- the render attribute of the appropriate tagUsage element may be used to indicate a default rendition for all occurrences of the named element
- the global rendition attribute may be used on any element to indicate its rendition, over-riding any supplied default value
<rendition xml:id="style1">
... description of one default rendition here ...
</rendition>
<rendition xml:id="style2">
... description of another default rendition here ...
</rendition>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="p" render="#style1"> ... </tagUsage>
<tagUsage gi="hi" render="#style2"> ... </tagUsage>
</namespace>
</tagsDecl>
<!-- elsewhere in the document -->
<p>This paragraph,mostly rendered in style1, contains a few words
<hi>rendered in style2</hi>
</p>
<p rendition="#style2">This paragraph is all rendered in style2</p>
<p>This is back to style1</p>
- free
- Informal free text description
- css
- Cascading Stylesheet Language
- xslfo
- Extensible Stylesheet Language Formatting Objects
- other
- A user-defined formal description language
<rendition xml:id="center" scheme="css">text-align: center;</rendition>
<rendition xml:id="small" scheme="css">font-size: small;</rendition>
<rendition xml:id="large" scheme="css">font-size: large;</rendition>
<rendition xml:id="x-large" scheme="css">font-size: x-large;</rendition>
<rendition xml:id="xx-large" scheme="css">font-size: xx-large</rendition>
<rendition xml:id="expanded" scheme="css">letter-spacing: +3pt;</rendition>
<rendition xml:id="x-space" scheme="css">line-height: 150%;</rendition>
<rendition xml:id="xx-space" scheme="css">line-height: 200%;</rendition>
<rendition xml:id="red" scheme="css">color: red;</rendition>
</tagsDecl>
<docTitle rendition="#center #x-space">
<titlePart>
<lb/>
<hi rendition="#x-large">THE POEMS</hi>
<lb/>
<hi rendition="#small">OF</hi>
<lb/>
<hi rendition="#red #xx-large">ALGERNON CHARLES SWINBURNE</hi>
<lb/>
<hi rendition="#large #xx-space">IN SIX VOLUMES</hi>
</titlePart>
<titlePart rendition="#xx-space">
<lb/> VOLUME I.
<lb/>
<hi rendition="#red #x-large">POEMS AND BALLADS</hi>
<lb/>
<hi rendition="#x-space">FIRST SERIES</hi>
</titlePart>
</docTitle>
<docImprint rendition="#center">
<lb/>
<pubPlace rendition="#xx-space">LONDON</pubPlace>
<lb/>
<publisher rendition="#red #expanded">CHATTO & WINDUS</publisher>
<lb/>
<docDate when="1904" rendition="#small">1904</docDate>
</docImprint>
</titlePage>
When CSS is used as the underlying language, the scope
attribute may be used to specify CSS pseudo-elements. These
pseudo-elements are used to target styling for only a portion of the
given text. For example, there is a first-letter
pseudo-element to target styling of the first letter in the targeted
element, while there are the useful before
and
after
pseudo-elements, used often in conjunction with the
"content" property to add some styling characters (Unicode provides
quite a few) before or after the element content, where these are
useful to document the appearance of the source.
'“';</rendition>
<rendition xml:id="quoteAfter" scheme="css" scope="after">content:
'”';</rendition>
quoteBefore
and quoteAfter
. Where a q element is actually
rendered in the source with initial and final quotation marks, it may
then be encoded as follows:
ago...</q>
2.3.4.2 Tag usageTEI: Tag usage¶
As noted above, each namespace element, if present, should contain exactly one occurrence of a tagUsage element for each distinct element from the given namespace that occurs within the outermost text element associated with the teiHeader in which it appears.7 The tagUsage element is used to supply a count of the number of occurrences of this element within the text, which is given as the value of its occurs attribute. It may also be used to hold any additional usage information, which is supplied as running prose within the element itself.
</tagUsage>
</tagUsage>
2.3.5 The Reference System DeclarationTEI: The Reference System Declaration¶
- refsDecl (Déclaration du système de références) précise la manière dont les références canoniques ont été construites pour ce texte.
- cRefPattern (Modèle de référence canonique) spécifie un modèle d’expression et des règles de remplacement pour transformer une référence canonique en URI.
- refState/ (état de la référence) spécifie un composant d’une référence canonique définie par la méthode du bornage.
- as a prose description
- as a series of pairs of regular expressions and XPaths
- as a concatenation of sequentially organized milestones
More than one refsDecl element can be included in the header if more than one canonical reference scheme is to be used in the same document, but the current proposals do not check for mutual inconsistency.
2.3.5.1 Prose MethodTEI: Prose Method¶
The referencing scheme may be specified within the refsDecl by a simple prose description. Such a description should indicate which elements carry identifying information, and whether this information is represented as attribute values or as content. Any special rules about how the information is to be interpreted when reading or generating a reference string should also be specified here. Such a prose description cannot be processed automatically, and this method of specifying the structure of a canonical reference system is therefore not recommended for automatic processing.
<p>The <att>n</att> attribute of each text in this corpus carries a
unique identifying code for the whole text. The title of the text is
held as the content of the first <gi>head</gi> element within each
text. The <att>n</att> attribute on each <gi>div1</gi> and
<gi>div2</gi> contains the canonical reference for each such
division, in the form 'XX.yyy', where XX is the book number in Roman
numerals, and yyy the section number in arabic. Line breaks are
marked by empty <gi>lb</gi> elements, each of which includes the
through line number in Casaubon's edition as the value of its
<gi>n</gi> attribute.</p>
<p>The through line number and the text identifier uniquely identify
any line. A canonical reference may be made up by concatenating the
<gi>n</gi> values from the <gi>text</gi>, <gi>div1</gi>, or
<gi>div2</gi> and calculating the line number within each part.</p>
</refsDecl>
2.3.5.2 Search-and-Replace MethodTEI: Search-and-Replace Method¶
- cRefPattern (Modèle de référence canonique) spécifie un modèle d’expression et des règles de remplacement pour transformer une référence canonique en URI.
2.3.5.3 Milestone MethodTEI: Milestone Method¶
This method is appropriate when only ‘milestone’ tags (see section 3.10.3 Milestone Elements) are available to provide the required referencing information. It does not provide any abilities which cannot be mimicked by the search-and-replace referencing method discussed in the previous section, but in the cases where it applies, it provides a somewhat simpler notation.
- refState/ (état de la référence) spécifie un composant d’une référence canonique définie
par la méthode du bornage.
unit indique quel changement d'état survient au changement de borne. delim (délimiteur) Fournit une suite de caractères de délimitation après le composant de référence. length spécifie la longueur fixe du composant de la référence.
For example, the reference ‘Matthew 12:34’ might be thought of as representing the state of three variables: the book variable is in state ‘Matthew’; the chapter variable is in state ‘12’, and the verse variable is in state ‘34’. If milestone tagging has been used, there should be a tag marking the point in the text at which each of the above ‘variables’ changes its state.8 To find ‘Matthew 12:34’ therefore an application must scan left to right through the text, monitoring changes in the state of each of these three variables as it does so. When all three are simultaneously in the required state, the desired point will have been reached. There may of course be several such points.
The delim and length attributes are used to specify components of a canonical reference using this method in exactly the same way as for the stepwise method described in the preceding section. The other attributes are used to determine which instances of milestone tags in the text are to be checked for state-changes. A state-change is signalled whenever a new milestone tag is found with unit and, optionally, ed attributes identical to those of the refState element in question. The value for the new state may be given explicitly by the n attribute on the milestone element, or it may be implied, if the n attribute is not specified.
<refState
ed="first"
unit="page"
length="2"
delim="."/>
<refState ed="first" unit="line" length="3"/>
</refsDecl>
<milestone ed="first" unit="line"/>
The milestone referencing scheme, though conceptually simple, is not supported by a generic SGML or XML parser. Its use places a correspondingly greater burden of verification and accuracy on the encoder.
A reference system declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which the declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
2.3.6 The Classification DeclarationTEI: The Classification Declaration¶
- classDecl (déclaration de classification) contient une ou plusieurs taxinomies définissant les codes de classification utilisés n’importe où dans le texte.
- taxonomy (taxinomie) définit une typologie soit implicitement au moyen d’une référence bibliographique, soit explicitement au moyen d’une taxinomie structurée.
- category (catégorie) contient une catégorie descriptive particulière, éventuellement intégrée dans une catégorie de niveau supérieur, à l’intérieur d’une taxinomie définie par l’utilisateur.
- catDesc (Description de la catégorie) décrit une catégorie particulière à l’intérieur d’une taxinomie ou d’une typologie de texte, sous forme d’un court texte descriptif suivi ou dans les termes des paramètres contextuels utilisés dans l’élément Description du texte textDesc.
<bibl>
<title>Dewey Decimal Classification</title>
<edition>Abridged Edition 12</edition>
</bibl>
</taxonomy>
<bibl>Brown Corpus</bibl>
<category xml:id="b.a">
<catDesc>Press Reportage</catDesc>
<category xml:id="b.a1">
<catDesc>Daily</catDesc>
</category>
<category xml:id="b.a2">
<catDesc>Sunday</catDesc>
</category>
<category xml:id="b.a3">
<catDesc>National</catDesc>
</category>
<category xml:id="b.a4">
<catDesc>Provincial</catDesc>
</category>
<category xml:id="b.a5">
<catDesc>Political</catDesc>
</category>
<category xml:id="b.a6">
<catDesc>Sports</catDesc>
</category>
</category>
<category xml:id="b.d">
<catDesc>Religion</catDesc>
<category xml:id="b.d1">
<catDesc>Books</catDesc>
</category>
<category xml:id="b.d2">
<catDesc>Periodicals and tracts</catDesc>
</category>
</category>
</taxonomy>
<catDesc xml:lang="pl">literatura piękna</catDesc>
<catDesc xml:lang="en">fiction</catDesc>
<category xml:id="litProza">
<catDesc xml:lang="pl">proza</catDesc>
<catDesc xml:lang="en">prose</catDesc>
</category>
<category xml:id="litPoezja">
<catDesc xml:lang="pl">poezja</catDesc>
<catDesc xml:lang="en">poetry</catDesc>
</category>
<category xml:id="litDramat">
<catDesc xml:lang="pl">dramat</catDesc>
<catDesc xml:lang="en">drama</catDesc>
</category>
</category>
2.3.7 The Schema SpecificationTEI: The Schema Specification¶
The schemaSpec element contains a schema specification. When this element appears inside encodingDesc, it allows embedding of a schema inside a TEI header; alternatively, this element may be used in the body of an ODD document. The use of ODD files, and their relationship to schemas, is described in detail in 22 Documentation Elements.
<!-- Other encoding description elements... -->
<schemaSpec
ident="myTEICustomization"
docLang="en"
prefix="tei_"
xml:lang="en">
<moduleRef key="core"/>
<moduleRef key="tei"/>
<moduleRef key="header"/>
<moduleRef key="textstructure"/>
</schemaSpec>
</encodingDesc>
2.3.8 The Application Information ElementTEI: The Application Information Element¶
- to allow an application to discover that it has previously opened or edited a file, and what version of itself was used to do that;
- to show (through a date) which application last edited the file to allow for diagnosis of any problems that might have been caused by that application;
- to allow users to discover information about an application used to edit the file
- to allow the application to declare an interest in elements of the file which it has edited, so that other applications or human editors may be more wary of making changes to those sections of the file.
- appInfo (informations d'application) enregistre des informations sur l'application qui a été utilisée pour traiter le fichier TEI.
- application fournit des informations sur une application qui a été
utilisée pour le traitement du document.
ident fournit un identifiant pour l'application, indépendamment de son numéro de version ou du nom affiché. version fournit un numéro de version pour l'application, indépendamment de son identifiant ou du nom affiché.
Each application element identifies the current state of one software application with regard to the current file. This element is a member of the att.datable class, which provides a variety of attributes for associating this state with a date and time, or a temporal range. The ident and version attributes should be used to uniquely identify the application and its major version number (for example, ImageMarkupTool 1.5). It is not intended that an application should add a new application each time it touches the file.
<application version="1.5" ident="ImageMarkupTool" notAfter="2006-06-01">
<label>Image Markup Tool</label>
<ptr target="#P1"/>
<ptr target="#P2"/>
</application>
</appInfo>
2.3.9 Module-Specific DeclarationsTEI: Module-Specific Declarations¶
The elements discussed so far are available to any schema. When the schema in use includes some of the more specialized TEI modules, these make available other more module-specific components of the encoding description. These are discussed fully in the documentation for the module in question, but are also noted briefly here for convenience.
The fsdDecl element is available only when the iso-fs module is included in a schema. Its purpose is to document the feature system declaration (as defined in chapter 18.11 Feature System Declaration) underlying any analytic feature structures (as defined in chapter 18 Feature Structures) present in the text documented by this header.
The metDecl element is available only when the verse module is included in a schema. Its purpose is to document any metrical notation scheme used in the text, as further discussed in section 6.3 Rhyme and Metrical Analysis. It consists either of a prose description or a series of metSym elements.
The variantEncoding element is available only when the textcrit module is included in a schema. Its purpose is to document the method used to encode textual variants in the text, as discussed in section 12.2 Linking the Apparatus to the Text.
2.4 The Profile DescriptionTEI: The Profile Description¶
- profileDesc (description du profil) fournit une description détaillée des aspects non-bibliographiques du texte, notamment les langues utilisées et leurs variantes, les circonstances de sa production, les collaborateurs et leur statut.
- creation (création) contient des informations concernant la création d’un texte.
- langUsage (langue utilisée) décrit les langues, variétés de langues, registres, dialectes, etc. présents à l’intérieur d’un texte.
- textClass (classification du texte) regroupe des informations décrivant la nature ou le sujet d’un texte selon des termes issus d’un système de classification standardisé, d’un thésaurus, etc.
- calendarDesc (description des calendriers) contient la description des différents calendriers utilisés dans des dates écrits dans un manuscrit.
- textDesc (description de texte) fournit la description d'un texte sous l'angle du contexte situationnel
- particDesc (description des participants) décrit les locuteurs, voix ou autres participants identifiables d'une interaction linguistique.
- settingDesc (description du contexte) décrit le(s) contexte(s) dans lesquels se situe une interaction linguistique, soit sous la forme d'une description en prose, soit sous celle d'une série d'éléments décrivant le contexte.
- handNotes contient un ou plusieurs éléments handNote qui documentent les différentes mains identifiées dans les textes source.
2.4.1 CreationTEI: Creation¶
- creation (création) contient des informations concernant la création d’un texte.
<date when="1992-08">August 1992</date>
<rs type="city">Taos, New Mexico</rs>
</creation>
2.4.2 Language UsageTEI: Language Usage¶
- langUsage (langue utilisée) décrit les langues, variétés de langues, registres, dialectes, etc. présents à l’intérieur d’un texte.
- language (langue) caractérise une langue ou une variété de langue utilisée
dans un texte.
usage précise approximativement le pourcentage du volume de texte utilisant cette langue. ident (identificateur) fournit un code de langue issu de la recommandation RFC 3066 RFC 3066 (ou son successeur) utilisé pour identifier la langue précisée par cet élément, référencé par l’attribut global xml:lang s’appliquant à l’élément considéré.
A language element may be supplied for each different language used in a document. If used, its ident attribute should specify an appropriate language identifier, as further discussed in section vi.1. Language identification. This is particularly important if extended language identifiers have been used as the value of xml:lang attributes elsewhere in the document.
<language ident="fr-CA" usage="60">Québecois</language>
<language ident="en-CA" usage="20">Canadian business English</language>
<language ident="en-GB" usage="20">British English</language>
</langUsage>
2.4.3 The Text ClassificationTEI: The Text Classification¶
- textClass (classification du texte) regroupe des informations décrivant la nature ou le sujet d’un texte selon des termes issus d’un système de classification standardisé, d’un thésaurus, etc.
- by reference to a recognized international classification such as the Dewey Decimal Classification, the Universal Decimal Classification, the Colon Classification, the Library of Congress Classification, or any other system widely used in library and documentation work
- by providing a set of keywords, as provided for example by British Library or Library of Congress Cataloguing in Publication data
- by referencing any other taxonomy of text categories recognized in the field concerned, or peculiar to the material in hand; this may include one based on recurring sets of values for the situational parameters defined in section 15.2.1 The Text Description, or the demographic elements described in section 15.2.2 The Participant Description
- keywords (mot-clé) contient une liste de mots-clés ou d’expressions
décrivant la nature ou le sujet d’un texte.
scheme désigne la liste close de mots dans lequel l'ensemble des mots-clés concernés est défini. - classCode (code de classification) contient le code de classification attribué à ce texte en
référence à un système standard de classification.
scheme identifie le système de classification ou la taxinomie utilisée. - catRef/ (Référence à la catégorie) spécifie une ou plusieurs catégories définies dans une taxinomie ou une typologie textuelle.
The keywords element simply categorizes an individual text by supplying a list of keywords which may describe its topic or subject matter, its form, date, etc. In some schemes, the order of items in the list is significant, for example, from major topic to minor; in others, the list has an organized substructure of its own. No recommendations are made here as to which method is to be preferred. Wherever possible, such keywords should be taken from a recognized source, such as the British Library/Library of Congress Cataloguing in Publication data in the case of printed books, or a published thesaurus appropriate to the field.
<term>Babbage, Charles</term>
<term>Mathematicians - Great Britain - Biography</term>
</keywords>
scheme="http://id.loc.gov/authorities/about.html#lcsh">
<term>English literature -- History and criticism -- Data processing.</term>
<term>English literature -- History and criticism -- Theory, etc.</term>
<term>English language -- Style -- Data processing.</term>
<term>Style, Literary -- Data processing.</term>
</keywords>
<term>ceremonials</term>
<term>fairs</term>
<term>street life</term>
</keywords>
<!-- elsewhere in the document -->
<taxonomy xml:id="welch">
<bibl>
<title>Notes on London Municipal Literature, and a Suggested
Scheme for Its Classification</title>
<author>Charles Welch</author>
<edition>1895</edition>
</bibl>
</taxonomy>
Alternatively, if the keyword vocabulary itself is locally defined, the scheme attribute will point to the local definition, which will typically be held in a taxonomy element within the classDecl part of the encoding description (see section 2.3.6 The Classification Declaration).
scheme="http://www.udcc.org/udcsummary/php/index.php">005.756</classCode>
The catRef element categorizes an individual text by pointing to one or more category elements using the target attribute, which it inherits from the att.pointing class. The category element (which is fully described in section 2.3.6 The Classification Declaration) holds information about a particular classification or category within a given taxonomy. Each such category must have a unique identifier, which may be supplied as the value of the target attribute for catRef elements which are regarded as falling within the category indicated.
target="#b.a4 #b.d2"
scheme="http://www.example.com/browncorpus"/>
<catRef target="http://www.example.com/SUC/#A45"/>
In general, it is a matter of style whether to use a single catRef with mulitple identifers in the value of target or multiple catRef elements, each with a single identifier in the value of target. However, note that maintenance of a TEI document with a large number of values within a single target can be cumbersome.
The distinction between the catRef and classCode elements is that the values used as identifying codes are exhaustively enumerated for the former, typically within the TEI header. In the latter case, however, the values use any externally-defined scheme, and therefore may be taken from a more open-ended descriptive classification system.
2.4.4 Calendar DescriptionTEI: Calendar Description¶
- calendarDesc (description des calendriers) contient la description des différents calendriers utilisés dans des dates écrits dans un manuscrit.
- calendar (calendrier) describes a calendar or dating system used in a dating formula in the text.
<calendar xml:id="Gregorian">
<p>Gregorian calendar</p>
</calendar>
<calendar xml:id="Stardate">
<p>Fictional Stardate (from Star Trek series)</p>
</calendar>
<calendar xml:id="BP">
<p>Calendar years before present (measured from 1950)</p>
</calendar>
</calendarDesc>
to the nearest decimal point</date>...</p>
2.5 The Revision DescriptionTEI: The Revision Description¶
- revisionDesc (descriptif des révisions) fournit un résumé de l’historique des révisions d’un fichier.
- change résume une modification ou une correction apportée à une version particulière d’un texte électronique partagé entre plusieurs chercheurs.
The main purpose of the revision description is to record changes in the text to which a header is prefixed. However, it is recommended TEI practice to include entries also for significant changes in the header itself (other than the revision description itself, of course). At the very least, an entry should be supplied indicating the date of creation of the header.
The log consists of a list of entries, one for each change. This may be encoded using either the regular list element, as described in section 3.7 Lists or as a series of special purpose change elements, each of which contains a more detailed description of the changes made. The attributes when and who are used to indicate the date of the change and the person responsible for it respectively. The description of the change itself can range from a simple phrase to a series of paragraphs. If a number is to be associated with one or more changes (for example, a revision number), the global n attribute may be used to indicate it.
It is recommended to give changes in reverse chronological order, most recent first.
<!-- ... --><revisionDesc>
<change n="RCS:1.39" when="2007-08-08" who="#jwernimo.lrv">Changed <val>drama.verse</val>
<gi>lg</gi>s to <gi>p</gi>s. <note>we have opened a discussion about the need for a new
value for <att>type</att> of <gi>lg</gi>, <val>drama.free.verse</val>, in order to address
the verse of Behn which is not in regular iambic pentameter. For the time being these
instances are marked with a comment note until we are able to fully consider the best way
to encode these instances.</note>
</change>
<change n="RCS:1.33" when="2007-06-28" who="#pcaton.xzc">Added <att>key</att> and <att>reg</att>
to <gi>name</gi>s.</change>
<change n="RCS:1.31" when="2006-12-04" who="#wgui.ner">Completed renovation. Validated.</change>
</revisionDesc>
<title>The Amorous Prince, or, the Curious Husband, 1671</title>
<author>
<persName ref="#abehn.aeh">Behn, Aphra</persName>
</author>
<respStmt xml:id="pcaton.xzc">
<persName>Caton, Paul</persName>
<resp>electronic publication editor</resp>
</respStmt>
<respStmt xml:id="wgui.ner">
<persName>Gui, Weihsin</persName>
<resp>encoder</resp>
</respStmt>
<respStmt xml:id="jwernimo.lrv">
<persName>Wernimont, Jacqueline</persName>
<resp>encoder</resp>
</respStmt>
</titleStmt>
2.6 Minimal and Recommended Headers TEI: Minimal and Recommended Headers ¶
The TEI header allows for the provision of a very large amount of information concerning the text itself, its source, its encodings, and revisions of it, as well as a wealth of descriptive information such as the languages it uses and the situation(s) in which it was produced, together with the setting and identity of participants within it. This diversity and richness reflects the diversity of uses to which it is envisaged that electronic texts conforming to these Guidelines will be put. It is emphatically not intended that all of the elements described above should be present in every TEI Header.
The amount of encoding in a header will depend both on the nature and the intended use of the text. At one extreme, an encoder may expect that the header will be needed only to provide a bibliographic identification of the text adequate to local needs. At the other, wishing to ensure that their texts can be used for the widest range of applications, encoders will want to document as explicitly as possible both bibliographic and descriptive information, in such a way that no prior or ancillary knowledge about the text is needed in order to process it. The header in such a case will be very full, approximating to the kind of documentation often supplied in the form of a manual. Most texts will lie somewhere between these extremes; textual corpora in particular will tend more to the latter extreme. In the remainder of this section we demonstrate first the minimal, and next a commonly recommended, level of encoding for the bibliographic information held by the TEI header.
<fileDesc>
<titleStmt>
<title>Thomas Paine: Common sense, a
machine-readable transcript</title>
<respStmt>
<resp>compiled by</resp>
<name>Jon K Adams</name>
</respStmt>
</titleStmt>
<publicationStmt>
<distributor>Oxford Text Archive</distributor>
</publicationStmt>
<sourceDesc>
<bibl>The complete writings of Thomas Paine, collected and edited
by Phillip S. Foner (New York, Citadel Press, 1945)</bibl>
</sourceDesc>
</fileDesc>
</teiHeader>
The only mandatory component of the TEI Header is the fileDesc element. Within this, titleStmt, publicationStmt, and sourceDesc are all required constituents. Within the title statement, a title is required, and an author should be specified, even if it is unknown, as should some additional statement of responsibility, here given by the respStmt element. Within the publicationStmt, a publisher, distributor, or other agency responsible for the file must be specified. Finally, the source description should contain at the least a loosely structured bibliographic citation identifying the source of the electronic text if (as is usually the case) there is one.
<fileDesc>
<titleStmt>
<title>Common sense, a machine-readable transcript</title>
<author>Paine, Thomas (1737-1809)</author>
<respStmt>
<resp>compiled by</resp>
<name>Jon K Adams</name>
</respStmt>
</titleStmt>
<editionStmt>
<edition>
<date>1986</date>
</edition>
</editionStmt>
<publicationStmt>
<distributor>Oxford Text Archive.</distributor>
<address>
<addrLine>Oxford University Computing Services,</addrLine>
<addrLine>13 Banbury Road,</addrLine>
<addrLine>Oxford OX2 6RB,</addrLine>
<addrLine>UK</addrLine>
</address>
</publicationStmt>
<notesStmt>
<note>Brief notes on the text are in a
supplementary file.</note>
</notesStmt>
<sourceDesc>
<biblStruct>
<monogr>
<editor>Foner, Philip S.</editor>
<title>The collected writings of Thomas Paine</title>
<imprint>
<pubPlace>New York</pubPlace>
<publisher>Citadel Press</publisher>
<date>1945</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<encodingDesc>
<samplingDecl>
<p>Editorial notes in the Foner edition have not
been reproduced. </p>
<p>Blank lines and multiple blank spaces, including paragraph
indents, have not been preserved. </p>
</samplingDecl>
<editorialDecl>
<correction status="high" method="silent">
<p>The following errors
in the Foner edition have been corrected:
<list>
<item>p. 13 l. 7 cotemporaries contemporaries </item>
<item>p. 28 l. 26 [comma] [period] </item>
<item>p. 84 l. 4 kin kind </item>
<item>p. 95 l. 1 stuggle struggle </item>
<item>p. 101 l. 4 certainy certainty </item>
<item>p. 167 l. 6 than that </item>
<item>p. 209 l. 24 publshed published </item>
</list>
</p>
</correction>
<normalization>
<p>No normalization beyond that performed
by Foner, if any. </p>
</normalization>
<quotation marks="all">
<p>All double quotation marks
rendered with ", all single quotation marks with
apostrophe. </p>
</quotation>
<hyphenation eol="none">
<p>Hyphenated words that appear at the
end of the line in the Foner edition have been reformed.</p>
</hyphenation>
<stdVals>
<p>The values of <att>when-iso</att> on the <gi>time</gi>
element always end in the format <val>HH:MM</val> or
<val>HH</val>; i.e., seconds, fractions thereof, and time
zone designators are not present.</p>
</stdVals>
<interpretation>
<p>Compound proper names are marked. </p>
<p>Dates are marked. </p>
<p>Italics are recorded without interpretation. </p>
</interpretation>
</editorialDecl>
<classDecl>
<taxonomy xml:id="lcsh">
<bibl>Library of Congress Subject Headings</bibl>
</taxonomy>
<taxonomy xml:id="lc">
<bibl>Library of Congress Classification</bibl>
</taxonomy>
</classDecl>
</encodingDesc>
<profileDesc>
<creation>
<date>1774</date>
</creation>
<langUsage>
<language ident="en" usage="100">English.</language>
</langUsage>
<textClass>
<keywords scheme="#lcsh">
<term>Political science</term>
<term>United States -- Politics and government —
Revolution, 1775-1783</term>
</keywords>
<classCode scheme="#lc">JC 177</classCode>
</textClass>
</profileDesc>
<revisionDesc>
<change when="1996-01-22" who="#MSM"> finished proofreading </change>
<change when="1995-10-30" who="#LB"> finished proofreading </change>
<change notBefore="1995-07-04" who="#RG"> finished data entry at end of term </change>
<change notAfter="1995-01-01" who="#RG"> began data entry before New Year 1995 </change>
</revisionDesc>
</teiHeader>
Many other examples of recommended usage for the elements discussed in this chapter are provided here, in the reference index and in the associated tutorials.
2.7 Note for Library CataloguersTEI: Note for Library Cataloguers¶
- ISBD(G)
- General International Standard Bibliographic Description is an international standard setting out what information should be recorded in a description of a bibliographical item. There are also separate ISBDs covering different types of material, e.g. ISBD(M) for monographs, ISBD(ER) for electronic resources. These separate ISBDs follow the same general scheme as the main ISBD(G), but provide appropriate interpretations for the specific materials under consideration.
- AACR2
- The Anglo-American Cataloguing Rules (second edition) were published in 1978, with revisions appearing periodically through 2005. AACR2 provides guidelines for the construction of catalogues in general libraries in the English-speaking world. AACR2 is explicitly based on the general framework of the ISBD(G) and the subsidiary ISBDs: it gives a description of how to describe bibliographic items and how to create access points such as subject or name headings and uniform titles. Other national cataloguing codes exist as well, including the Z44 series of standards from issued by the Association française de normalisation (AFNOR), Regeln für die alphabetische Katalogisierung in wissenschaftlichen Bibliotheken (RAK-WB), Regole italiane di catalogazione per autore (RICA), and Система стандартов по информации, библиотечному и издательскому делу. Библиографическая запись. Библиографическое описание. Общие требования и правила составления (ГОСТ 7.1).
- ANSI Z39.29
- The American National Standard for Bibliographic References was an American national standard governing bibliographic references for use in bibliographies, end-of-work lists, references in abstracting and indexing publications, and outputs from computerized bibliographic data bases. A revised version is maintained by the National Information Standards Organization (NISO) and called ANSI/NISO Z39.29: Bibliographic References. The related ISO standard is ISO 690. Other relevant national standards include BS 5605:1990, BS 6371:1983. DIN 1505-2, and ГОСТ 7.0.5.
- The TEI title statement may not categorise constituent titles in the same way as recommended by AACR2.
- The TEI title statement contains authors, editors, and other responsible parties in separate elements, with names which may not have been normalized; it does not necessarily contain a single statement of responsibility from the chief source of information.
- There is no place in a TEI header to specify the main entry or added entries for the catalogue record (name or title headings under which a catalogue record is filed).
- The TEI header does not require use of a particular vocabulary for subject headings or mandate the use of subject headings.
2.8 The TEI Header ModuleTEI: The TEI Header Module¶
- Module header: En-tête TEI
- Eléments définis: appInfo application authority availability biblFull cRefPattern calendar calendarDesc catDesc catRef category change classCode classDecl correction creation distributor edition editionStmt editorialDecl encodingDesc extent fileDesc funder geoDecl handNote hyphenation idno interpretation keywords langUsage language licence listChange namespace normalization notesStmt principal profileDesc projectDesc publicationStmt quotation refState refsDecl rendition revisionDesc samplingDecl scriptNote segmentation seriesStmt sourceDesc sponsor stdVals tagUsage tagsDecl taxonomy teiHeader textClass titleStmt typeNote