18 Feature Structures
Contenu
- 18.1 Organization of this Chapter
- 18.2 Elementary Feature Structures and the Binary Feature Value
- 18.3 Other Atomic Feature Values
- 18.4 Feature Libraries and Feature-Value Libraries
- 18.5 Feature Structures as Complex Feature Values
- 18.6 Re-entrant Feature Structures
- 18.7 Collections as Complex Feature Values
- 18.8 Feature Value Expressions
- 18.9 Default Values
- 18.10 Linking Text and Analysis
- 18.11 Feature System Declaration
- 18.12 Formal Definition and Implementation
A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of information, but they are of particular usefulness in the representation of linguistic analyses, especially where such analyses are partial, or underspecified. Feature structures represent the interrelations among various pieces of information, and their instantiation in markup provides a metalanguage for the generic representation of analyses and interpretations. Moreover, this instantiation allows feature values to be of specific types, and for restrictions to be placed on the values for particular features, by means of feature system declarations.72
18.1 Organization of this ChapterTEI: Organization of this Chapter¶
This chapter is organized as follows. Following this introduction, section 18.2 Elementary Feature Structures and the Binary Feature Value introduces the elements fs and f, used to represent feature structures and features respectively, together with the elementary binary feature value. Section 18.3 Other Atomic Feature Values introduces elements for representing other kinds of atomic feature values such as symbolic, numeric, and string values. Section 18.4 Feature Libraries and Feature-Value Libraries introduces the notion of predefined libraries or groups of features or feature values along with methods for referencing their components. Section 18.5 Feature Structures as Complex Feature Values introduces complex values, in particular feature-structures as values, thus enabling feature structures to be recursively defined. Section 18.7 Collections as Complex Feature Values discusses other complex values, in particular values which are collections, organized as sets, bags, and lists. Section 18.8 Feature Value Expressions discusses how the operations of alternation, negation, and collection of feature values may be represented. Section 18.9 Default Values discusses ways of representing underspecified, default, or uncertain values. Section 18.10 Linking Text and Analysis discusses how analyses may be linked to other parts of an encoded text. Section 18.11 Feature System Declaration describes the feature system declaration, a construct which provides for the validation of typed feature structures. Formal definitions for all the elements introduced in this chapter are provided in section 18.12 Formal Definition and Implementation.
18.2 Elementary Feature Structures and the Binary Feature ValueTEI: Elementary Feature Structures and the Binary Feature Value¶
The fundamental elements used to represent a feature structure analysis are f (for feature), which represents a feature-value pair, and fs (for feature structure), which represents a structure made up of such feature-value pairs. The fs element has an optional type attribute which may be used to represent typed feature structures, and may contain any number of f elements. An f element has a required name attribute and an associated value. The value may be simple: that is, a single binary, numeric, symbolic (i.e. taken from a restricted set of legal values), or string value, or a collection of such values, organized in various ways, for example, as a list; or it may be complex, that is, it may itself be a feature structure, thus providing a degree of recursion. Values may be under-specified or defaulted in various ways. These possibilities are all described in more detail in this and the following sections.
Feature and feature-value representations (including feature structure representations) may be embedded directly at any point in an XML document, or they may be collected together in special-purpose feature or feature-value libraries. The components of such libraries may then be referenced from other feature or feature-value representations, using the feats or fVal attribute as appropriate.
- fs (structure de traits) représente une structure de traits,
c'est-à-dire un ensemble de paires trait-valeur organisé comme une unité structurelle.
type spécifie le type de la structure de traits. feats (traits) référence les spécifications trait-valeur qui caractérisent cette structure de traits. - f (trait) représente une spécification
trait-valeur, c'est-à-dire l'association d'un nom avec une valeur d’un type quelconque
parmi plusieurs.
name donne un nom pour le trait fVal (valeur de traits) référence n'importe quel élément pouvant être utilisé pour représenter la valeur d'un trait. - binary/ (valeur binaire) représente la partie ‘valeur’ d'une spécification trait-valeur qui peut contenir l'une ou l'autre des deux valeurs possibles.
<f name="consonantal">
<binary value="true"/>
</f>
<f name="vocalic">
<binary value="false"/>
</f>
<f name="voiced">
<binary value="false"/>
</f>
<f name="anterior">
<binary value="true"/>
</f>
<f name="coronal">
<binary value="true"/>
</f>
<f name="continuant">
<binary value="true"/>
</f>
<f name="strident">
<binary value="true"/>
</f>
</fs>
The restriction of specific features to specific types of values (e.g. the restriction of the feature strident to a binary value) requires additional validation, as does any restriction on the features available within a feature structure of a particular type (e.g. whether a feature structure of type phonological segment necessarily contains a feature voiced). Such validation may be carried out at the document level, using special purpose processing, at the schema level using additional validation rules, or at the declarative level, using an additional mechanism such as the feature-system declaration discussed in 18.11 Feature System Declaration.
Although we have used the term binary for this kind
of value, and its representation in XML uses values such as
true
and false
(or, equivalently,
1
and 0
), it should be noted that such
values are not restricted to propositional assertions. As this example
shows, this kind of value is intended for use with any binary-valued
feature.
18.3 Other Atomic Feature ValuesTEI: Other Atomic Feature Values¶
- symbol/ (valeur symbolique) représente la partie valeur d'une spécification
trait-valeur qui contient un symbole extrait d'une liste finie.
value donne la valeur symbolique pour le trait, extraite d'une liste finie qui peut être spécifiée dans une déclaration de traits. - numeric/ (valeur numérique) représente la partie valeur d'une spécification trait-valeur qui contient une valeur ou une série numériques.
- string (valeur de chaîne) représente la partie valeur d'une spécification trait-valeur qui contient une chaîne de caractères.
<f name="case">
<symbol value="accusative"/>
</f>
<f name="gender">
<symbol value="feminine"/>
</f>
<f name="number">
<symbol value="plural"/>
</f>
</fs>
nominative
,
genitive
, dative
, accusative
,
etc.) and it is therefore appropriate to represent the values taken
in this instance as symbol elements. Note that, instead of
using a symbolic value for grammatical number, one could have named
the feature singular or plural and given it
an appropriate binary value, as in the following example: <f name="case">
<symbol value="accusative"/>
</f>
<f name="gender">
<symbol value="feminine"/>
</f>
<f name="singular">
<binary value="false"/>
</f>
</fs>
<f name="address">
<string>3418 East Third Street</string>
</f>
</fs>
<f name="houseNumber">
<numeric value="3418"/>
</f>
<f name="streetName">
<string>East Third Street</string>
</f>
</fs>
<f name="houseNumber">
<numeric value="3418" max="3440"/>
</f>
<f name="streetName">
<string>East Third Street</string>
</f>
</fs>
<f name="dailyRainFall">
<numeric value="0.0" max="1.3" trunc="false"/>
</f>
</fs>
<f name="dailyRainFall">
<numeric value="0.0" max="1.3" trunc="true"/>
</f>
</fs>
<f name="voice">active</f>
<f name="tense">SimPre</f>
</fs>
As noted above, additional processing is necessary to ensure that
appropriate values are supplied for particular features, for example
to ensure that the feature singular
is not given a value
such as <symbol value="feminine"/>. There are two
ways of attempting to ensure that only certain combinations of feature
names and values are used. First, if the total number of legal
combinations is relatively small, one can predefine all of them in a
construct known as a feature library, and then reference
the combination required using the feats attribute in the
enclosing fs element, rather than give it explicitly. This
method is suitable in the situation described above, since it requires
specifying a total of only ten (5 + 3 + 2) combinations of features
and values. Similarly, to ensure that only feature structures
containing valid combinations of feature values are used, one can put
definitions for all valid feature structures inside a feature
value library (so called, since a feature structure may be the
value of a feature). A total of 30 feature structures (5 × 3
× 2) is required to enumerate all the possible combinations of
individual case, gender and number values in the preceding
illustration. We discuss the use of such libraries and their
representation in XML further in section 18.4 Feature Libraries and Feature-Value Libraries below.
However, the most general method of attempting to ensure that only legal combinations of feature names and values are used is to provide a feature-system declaration discussed in 18.11 Feature System Declaration.
<!--...-->
<f
name="part_of_speech"
dcr:datcat="http://www.isocat.org/datcat/DC-1345"
fVal="common noun"
dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"/>
<!-- ... -->
</fs>
18.4 Feature Libraries and Feature-Value LibrariesTEI: Feature Libraries and Feature-Value Libraries¶
<f xml:id="CNS1" name="consonantal">
<binary value="true"/>
</f>
<f xml:id="CNS0" name="consonantal">
<binary value="false"/>
</f>
<f xml:id="VOC1" name="vocalic">
<binary value="true"/>
</f>
<f xml:id="VOC0" name="vocalic">
<binary value="false"/>
</f>
<f xml:id="VOI1" name="voiced">
<binary value="true"/>
</f>
<f xml:id="VOI0" name="voiced">
<binary value="false"/>
</f>
<f xml:id="ANT1" name="anterior">
<binary value="true"/>
</f>
<f xml:id="ANT0" name="anterior">
<binary value="false"/>
</f>
<f xml:id="COR1" name="coronal">
<binary value="true"/>
</f>
<f xml:id="COR0" name="coronal">
<binary value="false"/>
</f>
<f xml:id="CNT1" name="continuant">
<binary value="true"/>
</f>
<f xml:id="CNT0" name="continuant">
<binary value="false"/>
</f>
<f xml:id="STR1" name="strident">
<binary value="true"/>
</f>
<f xml:id="STR0" name="strident">
<binary value="false"/>
</f>
<!-- ... -->
</fLib>
/t/
, /d/
,
/s/
, and /z/
may be defined as follows.
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/>
<fs
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/>
<!-- ... -->
<fs
xml:id="T.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
xml:id="D.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
xml:id="S.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/>
<fs
xml:id="Z.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/>
<!-- ... -->
</fvLib>
Feature structures stored in this way may also be associated with the text which they are intended to annotate, either by a link from the text (for example, using the TEI global ana attribute), or by means of stand-off annotation techniques (for example, using the TEI link element): see further section 18.10 Linking Text and Analysis below.
Note that when features or feature structures are linked to in this way, the result is effectively a copy of the item linked to into the place from which it is linked. This form of linking should be distinguished from the phenomenon of structure-sharing, where it is desired to indicate that some part of an annotation structure appears simultaneously in two or more places within the structure. This kind of annotation should be represented using the vLabel element, as discussed in 18.6 Re-entrant Feature Structures below.
18.5 Feature Structures as Complex Feature ValuesTEI: Feature Structures as Complex Feature Values¶
Features may have complex values as well as atomic ones; the simplest such complex value is represented by supplying a fs element as the content of an f element, or (equivalently) by supplying the identifier of an fs element as the value for the fVal attribute on the f element. Structures may be nested as deeply as appropriate, using this mechanism. For example, an fs element may contain or point to an f element, which may contain or point to an fs element, which may contain or point to an f element, and so on.
surface
,
syntax
, and semantics
. The first of these
has an atomic string value, but the other two have complex values,
represented as nested feature structures of types
category
and act
respectively:
<f name="surface">
<string>love</string>
</f>
<f name="syntax">
<fs type="category">
<f name="pos">
<symbol value="verb"/>
</f>
<f name="val">
<symbol value="transitive"/>
</f>
</fs>
</f>
<f name="semantics">
<fs type="act">
<f name="rel">
<symbol value="LOVE"/>
</f>
</fs>
</f>
</fs>
verb
or transitive
. It might be
preferable to replace these atomic feature values by feature
structures. Suppose therefore that we maintain a feature-value
library for each of the major syntactic categories (N, V, ADJ, PREP):
<!-- ... -->
<fs xml:id="N" type="noun">
<!-- noun features defined here -->
</fs>
<fs xml:id="V" type="verb">
<!-- verb features defined here -->
</fs>
</fvLib>
N
,
V
, etc.) to reference a complete definition for the
corresponding feature structure. Each definition may be explicitly
contained within the fs element, as a number of f
elements. Alternatively, the relevant features may be referenced by
their identifiers, supplied as the value of the feats
attribute, as in these examples:
<fs xml:id="ADJ" type="adjective" feats="#F1 #F2"/>
<fs xml:id="PREP" type="preposition" feats="#F1 #F3"/>
<!-- ... -->
<f xml:id="NN-1" name="nominal">
<binary value="true"/>
</f>
<f xml:id="NN-0" name="nominal">
<binary value="false"/>
</f>
<f xml:id="VV-1" name="verbal">
<binary value="true"/>
</f>
<f xml:id="VV-0" name="verbal">
<binary value="false"/>
</f>
<!-- ... -->
</fLib>
<f name="surface">
<string>love</string>
</f>
<f name="syntax">
<fs type="category">
<f name="pos" fVal="#V"/>
<f name="val" fVal="#TRNS"/>
</fs>
</f>
<f name="semantics">
<fs type="act">
<f name="rel" fVal="#LOVE"/>
</fs>
</f>
</fs>
Although in principle the fVal attribute could point to any kind of feature value, its use is not recommended for simple atomic values.
18.6 Re-entrant Feature StructuresTEI: Re-entrant Feature Structures¶
- vLabel (étiquette de valeur) représente la partie valeur d'une spécification trait-valeur qui apparaît en plus d’un point dans une structure de traits.
<f name="nominal">
<fs>
<f name="nm-num">
<vLabel name="L1">
<symbol value="singular"/>
</vLabel>
</f>
<!-- other nominal features -->
</fs>
</f>
<f name="verbal">
<fs>
<f name="vb-num">
<vLabel name="L1"/>
</f>
</fs>
<!-- other verbal features -->
</f>
</fs>
In the above encoding, the features named vb-num
and
nm-num
exhibit structure sharing. Their values, given as
vLabel
elements, are understood to be references to the same
point in the feature structure, which is labelled by their
name attribute.
NVAL1
. 18.7 Collections as Complex Feature ValuesTEI: Collections as Complex Feature Values¶
- vColl (collection de valeurs) représente la partie valeur d'une spécification trait-valeur qui contient des valeurs multiples organisées comme un ensemble, un paquet ou une liste.
A feature whose value is regarded as a set, bag, or list may have any positive number of values as its content, or none at all, (thus allowing for representation of the empty set, bag, or list). The items in a list are ordered, and need not be distinct. The items in a set are not ordered, and must be distinct. The items in a bag are neither ordered nor distinct. Sets and bags are thus distinguished from lists in that the order in which the values are specified does not matter for the former, but does matter for the latter, while sets are distinguished from bags and lists in that repetitions of values do not count for the former but do count for the latter.
If no value is specified for the org attribute, the assumption is that the vColl defines a list of values. If the vColl element is empty, the assumption is that it represents the null list, set, or bag.
<f name="forenames">
<vColl>
<string>Daniel</string>
<string>Edouard</string>
</vColl>
</f>
<f name="mother" fVal="#p002"/>
<f name="father" fVal="#p009"/>
<f name="birthDate">
<fs type="date" feats="#y1988 #m04 #d17"/>
</f>
<f name="birthPlace" fVal="#austintx"/>
<f name="siblings">
<vColl org="set">
<fs copyOf="#pnb005"/>
<fs copyOf="#prb001"/>
</vColl>
</f>
</fs>
In this example, the vColl element is first used to supply a list of ‘name’ feature values, which together constitute the ‘forenames’ feature. Other features are defined by reference to values which we assume are held in some external feature value library (not shown here). For example, the vColl element is used a second time to indicate that the persons's siblings should be regarded as constituting a set rather than a list. Each sibling is represented by a feature structure: in this example, each feature structure is a copy of one specified in the feature value library.
<f name="category">
<symbol value="verb"/>
</f>
<f name="tense">
<symbol value="present"/>
</f>
<f name="agreement">
<fs>
<f name="person">
<symbol value="third"/>
</f>
<f name="number">
<symbol value="singular"/>
</f>
</fs>
</f>
</fs>
<f name="category">
<symbol value="verb"/>
</f>
<f name="tense">
<symbol value="present"/>
</f>
<f name="agreement">
<vColl org="set">
<symbol value="third"/>
<symbol value="singular"/>
</vColl>
</f>
</fs>
<f name="lex">
<symbol value="auxquels"/>
</f>
<f name="maf">
<vColl org="list">
<fs>
<f name="cat">
<symbol value="prep"/>
</f>
</fs>
<fs>
<f name="cat">
<symbol value="pronoun"/>
</f>
<f name="kind">
<symbol value="rel"/>
</f>
<f name="num">
<symbol value="pl"/>
</f>
<f name="gender">
<symbol value="masc"/>
</f>
</fs>
</vColl>
</f>
</fs>
The set, bag, or list which has no members is known as the null (or empty) set, bag, or list. A vColl element with no content and with no value for its feats attribute is interpreted as referring to the null set, bag, or list, depending on the value of its org attribute.
p027
(above) had no siblings, we might specify the
siblings feature as follows.
<vColl org="set"/>
</f>
A vColl element may also collect together one or more other vColl elements, if, for example one of the members of a set is itself a set, or if two lists are concatenated together. Note that such collections pay no attention to the contents of the nested vColl elements: if it is desired to produce the union of two sets, the vMerge element discussed below should be used to make a new collection from the two sets.
18.8 Feature Value ExpressionsTEI: Feature Value Expressions¶
- vAlt (valeur alternative) représente la partie valeur d'une spécification trait-valeur qui contient un jeu de valeurs, dont une seule peut être valide
- vNot (négation de valeur) représente une valeur de trait qui est la négation de son contenu.
- vMerge (collection fusionnée de valeurs) représente une valeur de trait , résultant de la fusion des valeurs de trait contenues dans les éléments enfants, qui utilisent l'agencement indiqué par l'attribut org.
18.8.1 AlternationTEI: Alternation¶
<numeric value="2" max="3"/>
</f>
<vAlt>
<numeric value="2"/>
<numeric value="3"/>
</vAlt>
</f>
<vAlt>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
</vAlt>
</f>
<vAlt>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
<vColl>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
</vColl>
</vAlt>
</f>
selling.points
to describe items that are mentioned to
enhance a property's sales value, such as whether it has a pool or a
good view. Now suppose for a particular listing, the selling points include
an alarm system and a good view, and either a pool or a jacuzzi (but
not both). This situation could be represented, using the
vAlt element, as follows.
<f name="selling.points">
<vColl org="set">
<string>alarm system</string>
<string>good view</string>
<vAlt>
<string>pool</string>
<string>jacuzzi</string>
</vAlt>
</vColl>
</f>
</fs>
<f name="selling.points">
<vColl org="set">
<vAlt>
<string>alarm system</string>
<string>good view</string>
</vAlt>
<vAlt>
<string>pool</string>
<string>jacuzzi</string>
</vAlt>
</vColl>
</f>
</fs>
If a large number of ambiguities or uncertainties need to be represented, involving a relatively small number of features and values, it is recommended that a stand-off technique, for example using the general-purpose alt element discussed in section 16.8 Alternation be used, rather than the special-purpose vAlt element.
18.8.2 NegationTEI: Negation¶
<vNot>
<numeric value="2"/>
</vNot>
</f>
case
are declared to be
nominative, genitive, dative, or accusative, whether in a TEI feature
system declaration or
by some other means. Then the following two specifications are equivalent:
<vNot>
<symbol value="genitive"/>
</vNot>
</f>
(ii)
<f name="case">
<vAlt>
<symbol value="nominative"/>
<symbol value="dative"/>
<symbol value="accusative"/>
</vAlt>
</f>
If however no such system declaration is available, all that one can say about a feature specified via negation is that its value is something other than the negated value.
Negation is always applied to a feature value, rather than to a feature-value pair. The negation of an atomic value is the set of all other values which are possible for the feature.
Any kind of value can be negated, including collections (represented by a vColl elements) or feature structures (represented by fs elements). The negation of any complex value is understood to be the set of values which cannot be unified with it. Thus, for example, the negation of the feature structure F is understood to be the set of feature structures which are not unifiable with F. In the absence of a constraint mechanism such as the Feature System Declaration, the negation of a collection is anything that is not unifiable with it, including collections of different types and atomic values. It will generally be more useful to require that the organization of the negated value be the same as that of the original value, for example that a negated set is understood to mean the set which is a complement of the set, but such a requirement cannot be enforced in the absence of a constraint mechanism.
18.8.3 Collection of ValuesTEI: Collection of Values¶
The vMerge element can be used wherever a feature value can appear. It contains two or more feature values, all of which are to be collected together. The organization of the resulting collection is specified by the value of the org attribute, which need not necessarily be the same as that of its constituent values if these are collections. For example, one can change a list to a set, or vice versa.
<f name="genders">
<vColl org="set">
<symbol value="masculine"/>
<symbol value="feminine"/>
</vColl>
</f>
</fs>
<f name="genders">
<vMerge org="list">
<vColl org="set">
<symbol value="masculine"/>
<symbol value="feminine"/>
</vColl>
<symbol value="neuter"/>
</vMerge>
</f>
</fs>
18.9 Default ValuesTEI: Default Values¶
- default/ (valeur de trait par défaut) représente la partie valeur d'une spécification trait-valeur contenant une valeur par défaut
<f name="gender">
<vAlt>
<symbol value="feminine"/>
<symbol value="masculine"/>
<symbol value="neuter"/>
</vAlt>
</f>
<default/>
</f>
<symbol value="neuter"/>
</f>
<vNot>
<default/>
</vNot>
</f>
<vAlt>
<symbol value="feminine"/>
<symbol value="masculine"/>
</vAlt>
</f>
18.10 Linking Text and AnalysisTEI: Linking Text and Analysis¶
<w ana="#at0">The</w>
<w ana="#ajs">closest</w>
<w ana="#pnp">he</w>
<w ana="#vvd">came</w>
<w ana="#prp">to</w>
<w ana="#nn1">exercise</w>
<w ana="#vbd">was</w>
<w ana="#to0">to</w>
<w ana="#vvi">open</w>
<w ana="#crd">one</w>
<w ana="#nn1">eye</w>
<phr ana="#av0">
<w>every</w>
<w>so</w>
<w>often</w>
</phr>
<c ana="#pun">,</c>
<w ana="#cjs">if</w>
<w ana="#pni">someone</w>
<w ana="#vvd">entered</w>
<w ana="#at0">the</w>
<w ana="#nn1">room</w>
<!-- ... -->
</s>
<!-- ... -->
<fs xml:id="ajs" type="grammatical_structure" feats="#wj #ds"/>
<fs xml:id="at0" type="grammatical_structure" feats="#wl"/>
<fs xml:id="pnp" type="grammatical_structure" feats="#wr #rp"/>
<fs xml:id="vvd" type="grammatical_structure" feats="#wv #bv #fd"/>
<fs xml:id="prp" type="grammatical_structure" feats="#wp #bp"/>
<fs xml:id="nnn" type="grammatical_structure" feats="#wn #tc #ns"/>
<!-- ... -->
</fvLib>
<!-- ... -->
<f xml:id="bv" name="verbbase">
<symbol value="main"/>
</f>
<f xml:id="bp" name="prepbase">
<symbol value="lexical"/>
</f>
<f xml:id="ds" name="degree">
<symbol value="superlative"/>
</f>
<f xml:id="fd" name="verbform">
<symbol value="ed"/>
</f>
<f xml:id="ns" name="number">
<symbol value="singular"/>
</f>
<f xml:id="rp" name="prontype">
<symbol value="personal"/>
</f>
<f xml:id="tc" name="nountype">
<symbol value="common"/>
</f>
<f xml:id="wj" name="class">
<symbol value="adjective"/>
</f>
<f xml:id="wl" name="class">
<symbol value="article"/>
</f>
<f xml:id="wn" name="class">
<symbol value="noun"/>
</f>
<f xml:id="wp" name="class">
<symbol value="preposition"/>
</f>
<f xml:id="wr" name="class">
<symbol value="pronoun"/>
</f>
<f xml:id="wv" name="class">
<symbol value="verb"/>
</f>
<!-- ... -->
</fLib>
<w xml:id="S1W1">
<c xml:id="S1W1C1">C</c>ae<c xml:id="S1W1C2">s</c>ar</w>
<w xml:id="S1W2">
<c xml:id="S1W2C1">s</c>ei<c xml:id="S1W2C2">z</c>e<c xml:id="S1W2C3">d</c>
</w>
<w xml:id="S1W3">con<c xml:id="S1W3C1">t</c>rol</w>.
</s>
<fvLib xml:id="FSL1" n="phonological segment definitions">
<!-- as in previous example -->
</fvLib>
<linkGrp type="phonology">
<!-- ... -->
<link target="#S.DF #S1W3C1"/>
<link target="#Z.DF #S1W2C3"/>
<link target="#S.DF #S1W2C1"/>
<link target="#Z.DF #S1W2C2"/>
<!-- ... -->
</linkGrp>
<w xml:id="mds0901">The</w>
<w xml:id="mds0902">closest</w>
<w xml:id="mds0903">he</w>
<w xml:id="mds0904">came</w>
<w xml:id="mds0905">to</w>
<w xml:id="mds0906">exercise</w>
<!-- ... -->
</s>
<!-- ... -->
<link target="#mds0901 #at0"/>
<link target="#mds0902 #ajs"/>
<link target="#mds0903 #pnp"/>
<link target="#mds0904 #vvd"/>
<link target="#mds0905 #prp"/>
<link target="#mds0906 #nn1"/>
<link target="#mds0907 #vbd"/>
<link target="#mds0908 #to0"/>
<link target="#mds0909 #vvi"/>
<link target="#mds0910 #crd"/>
<!-- ... -->
</linkGrp>
18.11 Feature System DeclarationTEI: Feature System Declaration¶
- It provides a mechanism by which the encoder can list all of the feature names and feature values and give a prose description as to what each represents.
- It provides a mechanism by which the encoder can define constraints not only what it means to be a well-formed feature structure, but also valid feature structure, relative to a given theory stated in typed feature logic. These constraints may involve constraints on the range of a feature value, constraints on what features are valid within certain types of feature structures, or constraints that prevent the co-occurrence of certain feature-value pairs.
- It provides a mechanism by which the encoder can define the intended interpretation of underspecified feature structures. This involves defining default values (whether literal or computed) for missing features.
The scheme described in this chapter may be used to document any feature structure system, but is primarily intended for use with the feature structure representation defined by the ISO 24610-1:2006 standard, which corresponds with the recommendations presented in these Guidelines, 18 Feature Structures. This chapter relies upon, but does not reproduce, formal definitions and descriptions presented more thoroughly in the ISO standard, which should be consulted in case of ambiguity or uncertainty.
The FSD serves an important function in documenting precisely what the encoder intended by the system of feature structure markup used in an XML-encoded text. The FSD is also an important resource which standardizes the rules of inference used by software to validate the feature structure markup in a text, and to infer the full interpretation of underspecified feature structures.
The reader should be aware the terminology used in this document does not always closely follow conventional practice in formal logic, and may also diverge from practice in some linguistic applications of typed feature structures. In particular, the term ‘interpretation’ when applied to a feature structure is not an interpretation in the model-theoretic sense, but is instead a minimally informative (or equivalently, most general) extension of that feature structure that is consistent with a set of constraints declared by an FSD. In linguistic application, such a system of constraints is the principal means by which the grammar of some natural language is expressed. There is a great deal of disagreement as to what, if any, model-theoretic interpretation feature structures have in such applications, but the status of this formal kind of interpretation is not germane to the present document. Similarly, the term ‘valid’ is used here as elsewhere in these Guidelines to identify the syntactic state of well-formedness in the sense defined by the logic of typed feature structures itself, as distinct from and in addition to the ‘well-formedness’ that pertains at the level of this encoding standard. No appeal to any notion from formal semantics should be inferred.
We begin by describing how an encoded text is associated with one or more feature system declarations. The second, third, and fourth sections describe the overall structure of a feature system declaration and give details of how to encode its components. The final section offers a full example; fuller discussion of the reasoning behind FSDs and another complete example are provided in Langendoen and Simons (1995).
18.11.1 Linking a TEI Text to Feature System DeclarationsTEI: Linking a TEI Text to Feature System Declarations¶
In order for application software to use feature system declarations to aid in the automatic interpretation of encoded texts, or even for human readers to find the appropriate declarations which document the feature system used in markup, there must be a formal link from the encoded texts to the declarations. However, the schema which declares the syntax of the Feature System itself should be kept distinct from the feature structure schema, which is an application of that system.
A document containing typed feature structures may simply include a feature system declaration documenting those feature structures. A more usual scenario, however, is that the same feature system declaration (or parts of it) will be shared by many documents. In either case, an fsDecl element for each distinct type of feature structure used must be provided and associated with the type, which is the value used within each feature structure for its type attribute.
- fsdDecl (Déclaration de système de traits (FSD)) fournit une déclaration du système de traits consistant en une ou plusieurs déclarations de structure de traits ou des liens vers une déclaration de structure de traits.
- fsdLink/ (lien vers la déclaration d'une structure de traits) associe le nom d'une structure de traits "type" à sa déclaration de structure de traits.
- fsDecl (déclaration de structure de traits) déclare un type de structure de traits
<teiHeader>
<fileDesc>
<!-- doc1 -->
</fileDesc>
<encodingDesc>
<!-- ... -->
<fsdDecl>
<fsDecl type="gpsg">
<!-- information about this type -->
</fsDecl>
<fsDecl type="lex">
<!-- information about this type -->
</fsDecl>
</fsdDecl>
<!-- ... -->
</encodingDesc>
</teiHeader>
<text>
<body>
<!-- ... -->
<fs type="lex">
<!-- an instance of the typed feature structure "lex" -->
</fs>
<!-- ... -->
</body>
</text>
</TEI>
In this case there is an implicit link between the fs element and the corresponding fsDecl element because they share the same value for their type attribute and appear within the same document. This is a short cut for the more general case which requires a more explicit link provided by means of the fsdLink element, as demonstrated below.
<!-- ... --><fsdDecl>
<fsDecl type="gpsg" xml:id="GPSG">
<!-- information about this type -->
</fsDecl>
<fsDecl type="lex" xml:id="LEX">
<!-- information about this type -->
</fsDecl>
</fsdDecl>
<teiHeader>
<fileDesc>
<!-- doc2 -->
</fileDesc>
<encodingDesc>
<!-- ... -->
<fsdDecl>
<fsdLink type="gpsg" target="doc1.xml#GPSG"/>
<fsdLink type="lexx" target="doc1.xml#GPSG"/>
</fsdDecl>
<!-- ... -->
</encodingDesc>
</teiHeader>
<text>
<body>
<!-- ... -->
<fs type="lexx">
<!-- an instance of the typed feature structure "lex" -->
</fs>
<!-- ... -->
</body>
</text>
</TEI>
A fsdDecl may be given, as above, within the encoding description of the teiHeader element of a TEI document containing typed feature structures. Alternatively, it may appear independently of any feature structures, as a document in its own right, possibly with its own teiHeader. These options are both possible because the element is a member of both the model.encodingDescPart class and the model.resourceLike class.
The current recommendations provide no way of enforcing uniqueness of the type values among fsdDecl elements, nor of requiring that every type value specified on a fs element be also declared on an fsdDecl element. Encoders requiring such constraints (which might have some obvious utility in assisting the consistency and accuracy of tagging) are recommended to develop tools to enforce them, using such mechanisms as Schematron assertions.
18.11.2 The Overall Structure of a Feature System DeclarationTEI: The Overall Structure of a Feature System Declaration¶
- fsDescr (description de système de traits (dans FSD)) décrit en texte libre ce que représente le type de structure de traits déclaré dans le fsDecl englobant
- fDecl (déclaration de trait) déclare un trait unique, en en précisant le nom, l'organisation, la liste de valeurs autorisées et, éventuellement, la valeur par défaut.
- fsConstraints (contraintes de structure de traits) définit les contraintes sur le contenu de structures de traits bien formées
Feature declarations and feature structure constraints are described in the next two sections. Note that the specification of similar fsDecl elements can be simplified by devising an inheritance hierarchy for the feature structure types. Each fsDecl element may name one or more ‘basetypes’ from which it inherits feature declarations and constraints (these are often called ‘supertypes’). For instance, suppose that <fsDecl type="Basic"> contains <fDecl name="One"> and <fDecl name="Two">, and that <fsDecl type="Derived" baseTypes="Basic"> contains just <fDecl name="Three">. Then any instance of <fs type="Derived"> must include all three features. This is because <fsDecl type="Derived"> inherits the two feature declarations from <fsDecl type="Basic"> when it specifies a base type of Basic.
<fsDescr>Describes what this type of fs represents</fsDescr>
<fDecl name="featureOne">
<!-- The declaration for featureOne -->
</fDecl>
<fDecl name="featureTwo">
<!-- The declaration for featureTwo -->
</fDecl>
<fsConstraints>
<!-- The feature structure constraints go here -->
</fsConstraints>
</fsDecl>
The attribute baseTypes gives the name of one or more types from which this type inherits feature specifications and constraints; if this type includes a feature specification with the same name as one inherited from any of the types specified by this attribute, or if more than one specification of the same name is inherited, then the possible values of that feature is determined by unification. Similarly, the set of constraints applicable is derived by conjoining those specified explicitly within this element with those implied by the baseTypes attribute. When no base type is specified, no feature specification or constraint is inherited.
Although the present standard does provide for default feature values, feature inheritance is defined to be monotonic.
The process of combining constraints may result in a contradiction, for example if two specifications for the same feature specify disjoint ranges of values, and at least one such specification is mandatory. In such a case, there is no valid feature structure of the type being defined.
Every type specified by baseTypes must be a single word which is a legal XML name; for example, they cannot include whitespace or begin with digits. Multiple base types are separated with spaces, e.g. <fsDecl type="Sub" baseTypes="Super1 Super2">.
18.11.3 Feature DeclarationsTEI: Feature Declarations¶
- is not optional (i.e., is obligatory),
- has no value provided, or the value default is provided (see ISO 24610-1, Subclause 5.10, Default Values, and
- either has no default specified, or has conditional defaults, none of the conditions on which is met,
- is optional,
- has no value provided, or the value default is provided, and
- either has a default specified, or has conditional defaults, one of the conditions on which is met,
It is possible that a feature structure will not have a valid extension because the default value that pertains to a feature is not consistent with that feature's declared range. Additional tools are required for the enforcement of such criteria.
- fDecl (déclaration de trait) déclare un trait unique, en en précisant le nom,
l'organisation, la liste de valeurs autorisées et, éventuellement, la valeur par défaut.
name indique le nom du trait déclaré ; correspond à l'attribut name des éléments f du texte. optional indique si la valeur de ce trait peut ou non exister - fDescr (description de trait (dans FSD)) décrit en texte libre le trait déclaré et ses valeurs
- vRange (gamme de valeurs) définit la plage de valeurs autorisées pour un trait, sous la forme d'un fs, vAlt, ou d'une valeur primitive ; pour que la valeur d'un élément f soit valide, elle doit être englobée dans la plage spécifiée. Si le f contient des valeurs multiples (comme prévu par l'attribut org), chacune des valeurs doit être englobée dans l'élément vRange.
- vDefault (valeur par défaut) déclare la valeur par défaut à fournir quand une structure de traits ne contient aucun cas de f pour ce nom ; si elle est inconditionnelle, on l'indique comme un élément fs (ou plusieurs, selon la valeur de l'attribut org du fDecl englobant) ; si elle est conditionnelle, on l'indique comme un ou plusieurs éléments if ; si aucune valeur par défaut n'est précisée ou si aucune condition ne correspond, la valeur nulle est retenue.
- if définit une valeur conditionnelle par défaut pour un trait ; la condition est indiquée comme une structure de traits et remplie si elle englobe la structure de traits dans le texte pour lequel on cherche une valeur par défaut.
- then/ sépare la condition de la valeur par défaut dans un if, ou l'antécédent de la conséquence dans un élément cond
The logic for validating feature values and for matching the conditions for supplying default values is based on the operation of subsumption. Subsumption is a standard operation in feature-structure-based formalisms. Informally, a feature structure FS subsumes all feature structures that are at least as informative as itself; that is, all feature structures that specify all of the feature values that FS does with values that are subsumed by the values that FS has, and that have all of the re-entrancies (see 18.6 Re-entrant Feature Structures) that FS does. (Carpenter (1992); see also Pereira (1987) and Shieber (1986)) A more formal definition is provided in ISO 24610-1:2006 .
Following the spirit of the informal definition above, we can extend
subsumption in a straightforward way to cover alternation, negation,
special primitive values, and the use of attributes in the markup.
For instance, a vAlt containing the value v subsumes v. The negation
of a value v (represented by means of the
vNot element discussed in section 18.8.2 Negation)
subsumes any value that is not v; for
example <vNot><numeric value='0'/></vNot>
subsumes any
numeric value other than zero.
The value <fs
type="X"/> subsumes any feature structure of type X,
even if it is not valid.
INV {+, -}
CONJ {and, both, but, either, neither, nor, or, NIL}
COMP {for, that, whether, if, NIL}
AGR CAT
PFORM {to, by, for, ...}
FSD 1: [-INV]
FSD 2: ~[CONJ]
FSD 9: [INF, +SUBJ] --> [COMP for]
<fDescr>inverted sentence</fDescr>
<vRange>
<vAlt>
<binary value="true"/>
<binary value="false"/>
</vAlt>
</vRange>
<vDefault>
<binary value="false"/>
</vDefault>
</fDecl>
The value range is specified as an alternation (more precisely, an exclusive disjunction), which can be represented by the binary feature value. That is, the value must be either true or false, but cannot be both or neither.
<fDescr>surface form of the conjunction</fDescr>
<vRange>
<vAlt>
<symbol value="and"/>
<symbol value="both"/>
<symbol value="but"/>
<symbol value="either"/>
<symbol value="neither"/>
<symbol value="nor"/>
<symbol value="or"/>
<symbol value="NIL"/>
</vAlt>
</vRange>
<vDefault>
<binary value="false"/>
</vDefault>
</fDecl>
<fDescr>surface form of the complementizer</fDescr>
<vRange>
<vAlt>
<symbol value="for"/>
<symbol value="that"/>
<symbol value="whether"/>
<symbol value="if"/>
<symbol value="NIL"/>
</vAlt>
</vRange>
<vDefault>
<if>
<fs>
<f name="VFORM">
<symbol value="INF"/>
</f>
<f name="SUBJ">
<binary value="true"/>
</f>
</fs>
<then/>
<symbol value="for"/>
</if>
</vDefault>
</fDecl>
<fDescr>agreement for person and number</fDescr>
<vRange>
<fs type="Agreement"/>
</vRange>
</fDecl>
<fDescr>word form of a preposition</fDescr>
<vRange>
<vNot>
<string/>
</vNot>
</vRange>
</fDecl>
<vNot><string/></vNot>
subsumes any string that is not the empty
string.Note that the class model.featureVal includes all possible single feature values, including feature structures, alternations (vAlt) and complex collections (vColl).
18.11.4 Feature Structure ConstraintsTEI: Feature Structure Constraints¶
Ensuring the validity of feature structures may require much more than simply specifying the range of allowed values for each feature. There may be constraints on the co-occurrence of one feature value with the value of another feature in the same feature structure or in an embedded feature structure.
Such constraints on valid feature structures are expressed as a series of conditional and biconditional tests in the fsConstraints part of an fsDecl. A particular feature structure is valid only if it meets all the constraints. The cond element encodes the conventional if-then conditional of boolean logic which succeeds when both the antecedent and consequent are true, or whenever the antecedent is false. The bicond element encodes the biconditional (if and only if) operation of boolean logic. It succeeds only when the corresponding if-then conditionals in both directions are true. In feature structure constraints the antecedent and consequent are expressed as feature structures; they are considered true if they subsume (see section 18.11.3 Feature Declarations) the feature structure in question, but in the case of consequents, this truth is asserted rather than simply tested. That is to say, a conditional is enforced by determining that the antecedent does not (and will never) subsume the given feature structure, or by determining that the antecedent does subsume the given feature structure, and then unifying the consequent with it (the result of which, if successful, will be subsumed by the consequent). In practice, the enforcement of such constraints can result in periods in which the truth of a constraint with respect to a given feature structure is simply not known; in this case, the constraint must be persistently monitored as the feature structure becomes more informative until either its truth value is determined or computation fails for some other reason.
- fsConstraints (contraintes de structure de traits) définit les contraintes sur le contenu de structures de traits bien formées
- cond (contrainte conditionnelle de structure de traits) définit une contrainte conditionnelle de structure de traits ; la conséquence et l'antécédent sont indiqués comme structures de traits ou comme groupes de structures de traits ; la contrainte est satisfaite si à la fois l'antécédent et la conséquence englobent une structure de traits donnée, ou si l'antécédent ne l'englobe pas
- bicond (contrainte bi-conditionnelle de structure de traits) définit une contrainte bi-conditionnelle de structure de traits ; la conséquence et l'antécédent sont tous deux indiqués comme structures de traits ou comme groupes de structures de traits ; la contrainte est satisfaite si chacun des deux englobe une structure de traits donnée, ou si aucun ne le fait
- then/ sépare la condition de la valeur par défaut dans un if, ou l'antécédent de la conséquence dans un élément cond
- iff/ (si et seulement si) sépare la condition de la conséquence dans un élément bicond
<fs>
<f name="INV">
<binary value="true"/>
</f>
</fs>
<then/>
<fs>
<f name="AUX">
<binary value="true"/>
</f>
<f name="VFORM">
<symbol value="FIN"/>
</f>
</fs>
</cond>
<fs>
<f name="BAR">
<symbol value="0"/>
</f>
</fs>
<iff/>
<fs>
<f name="N">
<binary value="true"/>
</f>
<f name="V">
<binary value="true"/>
</f>
<f name="SUBCAT">
<binary value="true"/>
</f>
</fs>
</bicond>
<fs>
<f name="BAR">
<symbol value="1"/>
</f>
</fs>
<then/>
<fs>
<f name="SUBCAT">
<binary value="false"/>
</f>
</fs>
</cond>
Note that cond and bicond use the empty tags then and iff, respectively, to separate the antecedent and consequent. These are primarily for the sake of enhancing human readability.
18.11.5 A Complete ExampleTEI: A Complete Example¶
<teiHeader>
<fileDesc>
<titleStmt>
<title>A sample FSD based on an extract from Gazdar
et al.'s GPSG feature system for English</title>
<respStmt>
<resp>encoded by</resp>
<name>Gary F. Simons</name>
</respStmt>
</titleStmt>
<publicationStmt>
<p>This sample was first encoded by Gary F. Simons (Summer
Institute of Linguistics, Dallas, TX) on January 28, 1991.
Revised April 8, 1993 to match the specification of FSDs
in version P2 of the TEI Guidelines. Revised again December 2004 to
be consistent with the feature structure representation standard
jointly developed with ISO TC37/SC4.
</p>
</publicationStmt>
<sourceDesc>
<p>This sample FSD does not describe a complete feature
system. It is based on extracts from the feature system
for English presented in the appendix (pages 245–247) of
Generalized Phrase Structure Grammar, by Gazdar, Klein,
Pullum, and Sag (Harvard University Press, 1985).</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<fsdDecl>
<fsDecl type="GPSG">
<fsDescr>Encodes a feature structure for the GPSG analysis
of English (after Gazdar, Klein, Pullum, and Sag)</fsDescr>
<fDecl name="INV">
<fDescr>inverted sentence</fDescr>
<vRange>
<vAlt>
<binary value="true"/>
<binary value="false"/>
</vAlt>
</vRange>
<vDefault>
<binary value="false"/>
</vDefault>
</fDecl>
<fDecl name="CONJ">
<fDescr>surface form of the conjunction</fDescr>
<vRange>
<vAlt>
<symbol value="and"/>
<symbol value="both"/>
<symbol value="but"/>
<symbol value="either"/>
<symbol value="neither"/>
<symbol value="nor"/>
<symbol value="or"/>
<symbol value="NIL"/>
</vAlt>
</vRange>
<vDefault>
<binary value="false"/>
</vDefault>
</fDecl>
<fDecl name="COMP">
<fDescr>surface form of the complementizer</fDescr>
<vRange>
<vAlt>
<symbol value="for"/>
<symbol value="that"/>
<symbol value="whether"/>
<symbol value="if"/>
<symbol value="NIL"/>
</vAlt>
</vRange>
<vDefault>
<if>
<fs>
<f name="VFORM">
<symbol value="INF"/>
</f>
<f name="SUBJ">
<binary value="true"/>
</f>
</fs>
<then/>
<symbol value="for"/>
</if>
</vDefault>
</fDecl>
<fDecl name="AGR">
<fDescr>agreement for person and number</fDescr>
<vRange>
<fs type="Agreement"/>
</vRange>
</fDecl>
<fDecl name="PFORM">
<fDescr>word form of a preposition</fDescr>
<vRange>
<vNot>
<string/>
</vNot>
</vRange>
</fDecl>
<fsConstraints>
<cond>
<fs>
<f name="INV">
<binary value="true"/>
</f>
</fs>
<then/>
<fs>
<f name="AUX">
<binary value="true"/>
</f>
<f name="VFORM">
<symbol value="FIN"/>
</f>
</fs>
</cond>
<bicond>
<fs>
<f name="BAR">
<symbol value="0"/>
</f>
</fs>
<iff/>
<fs>
<f name="N">
<binary value="true"/>
</f>
<f name="V">
<binary value="true"/>
</f>
<f name="SUBCAT">
<binary value="true"/>
</f>
</fs>
</bicond>
<cond>
<fs>
<f name="BAR">
<symbol value="1"/>
</f>
</fs>
<then/>
<fs>
<f name="SUBCAT">
<binary value="false"/>
</f>
</fs>
</cond>
</fsConstraints>
</fsDecl>
<fsDecl type="Agreement">
<fsDescr>This type of feature structure encodes the features
for subject-verb agreement in English</fsDescr>
<fDecl name="PERS">
<fDescr>person (first, second, or third)</fDescr>
<vRange>
<vAlt>
<symbol value="1"/>
<symbol value="2"/>
<symbol value="3"/>
</vAlt>
</vRange>
</fDecl>
<fDecl name="NUM">
<fDescr>number (singular or plural)</fDescr>
<vRange>
<vAlt>
<symbol value="sg"/>
<symbol value="pl"/>
</vAlt>
</vRange>
</fDecl>
</fsDecl>
</fsdDecl>
</TEI>
18.12 Formal Definition and ImplementationTEI: Formal Definition and Implementation¶
- Module iso-fs: Structures de traits