[Home Page]Plant Ontology Consortium
HomeSearch/Browse Plant OntologiesDownload OntologiesRequest PO termsDocumentationTutorialsMail ArchivesSite MapFeedback


Plant Ontology: Principles and Rationales

Objectives

The main objective of the Plant Ontology (PO) project is to create a set of defined terms that can be uniformly applied to describe the anatomy and morphology of all plants, providing a semantic framework for meaningful cross-species queries across databases. In order to make meaningful queries, the terms themselves must be organized in a way that reflects their known biological relationships. The purpose of such structure is to integrate existing species-specific vocabulary terms into a unified ontology for all plants that will facilitate functional annotation efforts, such as the annotation of gene expression data and phenotypes.

The original task of the Plant Ontology Consortium (POC) was to efficiently integrate the diverse vocabularies used to describe Arabidopsis, maize and rice anatomy and morphology. Thus, the first version of PO spanned two major taxonomic divisions: monocots and dicots. Current revisions of the PO are extending this controlled vocabulary to encompass not only other angiosperm families (such as Fabaceae and Solanaceae), but also groups such as gynomsperms, pteridophyte, bryophytes, and even algae.

It is important to emphasize the ontologies are not an extensive collection of botanical terms, but rather a complex hierarchical structure in which botanical concepts are described by their meaning and by their relationships to each other. While the educational aspects of earlier versions of the PO were to some extent limited by the software available, the current AmiGO web browser allows easy access to the PO by novice users. Work is underway to include links to images for many of the terms, making the PO a valuable educational tool.

Organizing principles and rationales

The current version of the Plant Ontology represents the next step toward a unified vocabulary for all plants. Plant phenotypic descriptors (e.g. gynoecium, leaf) are often common English words that have been applied with varying degrees of precision; the same term can be applied to quite different structures (e.g., floret in Compositae and in Poaceae), or conversely different terms can be applied to similar structures (e.g, leaf, needle). Current tools for constructing ontologies are strictly hierarchical, whereas description of a phenotype is fundamentally non-hierarchical, creating a tension between the lack of hierarchy in the terms and the formal hierarchy of the ontology. We recognize that this problem will become more complicated as more taxa are added. The current version of the PO is thus meant to be a stepping stone, so that annotation of genes can proceed, but is far from a final product.

As a candidate ontology in the Open Biomedical Ontologies (OBO) Foundry, the POC is following OBO Foundry principles in ontology design.
I. General considerations

The following principles were adopted by POC ontology developers:

II. Ontology structure rationale

What constitutes a term in the PO?

The following four criteria are considered when creating terms: morphology, anatomy, derivation, and position. Generic terms describing anatomical parts, spanning from organs to tissues to cell types, are generally included in PO. Also, a number of 'grouping' higher-node terms are created with the purpose of classifying main branches of the ontology (terms such as collective plant structure, infructescence or phyllome).

Subcellular structures are excluded from PO (terms like filiform aparatus, sieve plate, primary endosperm nucleus); these terms belong to the cellular components of the Gene Ontology. The following terms are exceptions: pollen tube (PO:0006345) and polar nucleus (PO:0020095). As a general rule for the exceptions, we consider any botanical term depicting a subcellular structure that is currently missing in Gene Ontology AND are required for annotation purposes.

Attributes of the anatomical parts are to large extent avoided in the PO. Examples: obsoleted terms 'lacunar collenchyma' and 'lamellar collenchyma', both attributes of the term 'collenchyma'. Users wishing to include attributes in their annotations should refer to the Phenotypic Trait Ontology (PATO)

Is_a completeness and part_of relationships

The POC is working to insure that every term has an is_a parent, that is, every term in the PO should be an instance of some higher level term. Ultimately, each term's ancestry should be traced to one of the top level terms: plant structure (PO:0009011) or plant growth and developmental stage (PO:0009012). Is_a completeness is a best practice for ontology development, because, logically, every entity must be an instance of more general class of entities (notice that the ultimate root of the PO is the entity 'all').

Wherever possible, terms in the plant structure branch of the PO are classified as part_of another structure. Since all plant structures are part of a plant (by definition), part_of relationships are ultimately rooted in whole plant (PO:0000003)

True path rule

The true path rule states that 'the pathway from a child term all the way up to its top-level parent(s) must always be true'. In the following example

% plant cell
     % meristematic cell
         % initial cell
             % cambial initial

since cambial initial is_a initial cell, and initial cell is_a meristematic cell, and meristematic cell is_a plant cell, it must be true that any cambial initial is_a meristematic cell and is_a plant cell.

The true path rule only works in one direction, though, so that it is not true that any meristematic cell is_a cambial cell. This is also true of the part_of relationship. For example, the PO contains the relationships petal part_of corolla and corolla part_of perianth. This means that every petal is part of some corolla and therefore part_of some perianth, but it does not necessarily mean that all perianths have_part petal.

The develops_from relationship type is an example of a possible violations of the true path rule. For example, axial cell (PO:0000081) is not an instance of a meristematic cell, but rather it is differentiated cell of the vascular system. However, it occurs as a child of meristematic cell with the develops_from relationship:

% cell
     % meristematic cell
         % initial cell
             % cambial initial
                 % fusiform initial
                    ~ axial cell

To be consistent with the true path rule, the develops_from relationship (and consequently gene annotations associated with terms that have this relationship) is not propagated beyond the first parental node (in this case, fusiform initial).

All relationships in the PO are defined according to the OBO Foundry registered Relation Ontology.

Issue of granularity (synonyms, instantiations, species-specific terms, 'sensu' terms)

Species-specific terms are included as separate identities only when there are required for annotation purposes. In many cases, granular terms are included as synonyms of the generic term, e.g., instances were converted into synonyms, The best example is term 'inflorescence' (PO:0009049) which currently has 14 synonyms (cob, cyme, panicle, raceme, etc.). The same rationale is used for species-specific terms. Therefore, we have taken a full advantage of different concepts of synonymy in the ontology, previously described by GO.

However, in some cases, species-specific terms are necessary to accommodate gene annotation. In such cases, extensive instantiation was required. Since a node should never be more species-specific than any of its children, when creating more granular terms, special care was taken to make sure that a generic parent exists. The current GO structure prohibits use of the same generic term under multiple instances of general terms. Instead, each use of the generic term must be specified as a particular instance such that the hierarchy above it is embedded in the term itself (i.e., child nodes can be at the same level of specificity as the parent node(s), or more specific).

'Sensu' terms

Taxon-specific high nodes are generally avoided. To avoid massive and unnecessary proliferation of sensu terms, the decision was made to include 'sensu' terms in very few special cases. Best examples are terms 'floret' and 'floret' (sensu Poaceae). By our current convention, any generic term applicable for broader range of flowering plants or terms common for the three original model species (Arabidopsis, maize and rice) can be included in PO (if such term is required). However, there are cases where a term has different meanings when applied to different taxa. Such terms are distinguished from one another by their definitions and by the sensu designation (sensu means 'in the sense of'), for instance, the term floret (sensu Poaceae). Using the 'sensu' reference makes the node available to other species that use the same term. A node should be divided into sensu sub-trees where the children are or are likely to be different. Since grass floret is different than that of Asteraceae, the term floret (PO:0009082) was instantiated to floret (sensu Poaceae). Furthermore, a sub-tree is generated by creating another, more granular child term: ear floret because the ear floret in maize (with two instances of upper and lower florets) is different from florets in other grasses. Consequently, all the children terms for the three instances of florets had to be instantiated as well.

% flower
     % floret
         % floret (sensu Poacea)
             % ear floret

Coverage extent

Granular botanical terms that are not used/needed for gene annotations and for gene expression data were excluded (examples are obsoleted terms: pyrene, contact cell, haustorial root).

Cell type terms (grouped under separate cell node) were not propagated to the respective tissue nodes, to avoid redundancy and allow for easier browsing of the ontology. However, cell types may have a part_of relationships to a tissue type. Exceptions were made when necessary to accommodate gene annotations (terms: guard cell, PO:0000256; root hair cell, PO:0000293).

III. The basic structures of the plant ontology (3 patterns)

We realized that a single 'pattern' for ontology structure could not be followed consistently throughout the ontology without proliferating a large number of terms and running into absurd situations (due to the nature of the subjects we are dealing, i.e., morphology/anatomy of all plants). Therefore, we adopted 3 structures that have been used interchangeably, as needed. In 'Structure 2' (to be used as default), we decided to make extensive use of synonymy and species-specific filtering options provided in AmiGO browser.

Structure 1:
In this model, instances of higher-node terms are added as needed (if they cannot be merged as synonyms, which is 'dictated' by annotation requirements). Therefore, 'part_of' child is added under specific instance of the term as well as an instance under generic term, with all the children (not shown).

% fruit (synonyms: achene, capsule...)
         < seed (generic term)
                 < berry seed
         % berry
                 < berry seed

Structure 2 (default):
All the instances of fruit were made synonyms of fruit (creating multiple synonyms of a single higher-node term). This eliminates or largely reduces a need for term proliferation (i.e., instantiations of the granular 'part_of' children terms under each instance of the term). Children are added only to the generic term 'seed' (not shown).

% fruit (synonyms: achene, capsule)
         < seed (generic term)

The best example is indeed the fruit node, which now has multiple synonyms and no instances.

Structure 3:
This model implies full instantiation, i.e., having instances under the generic term seed and also adding new, more granular terms under each instance of fruit. To reduce a massive proliferation of terms, only instances that are needed (for annotations purposes) are included.

% fruit
         < seed (generic term)
                 % berry seed
         % berry
                 < berry seed
         % capsule
                 < berry seed

The best use-case where this structure was necessary is the root node with elongation zone, where a generic term (a) was instantiated with two additional terms (b and c):

% root
         < elongation zone (PO:0020125) (a)
                % primary root elongation zone
                % lateral root elongation zone
        % primary root
                 < primary root elongation zone (b)
        % lateral root
                 < lateral root elongation zone (c)
IV. Organization of the top nodes of the Plant Structure Ontology (PSO)

In the previous version of the PSO, the top level was organized around the traditional characterization of plant anatomical structures (cell, tissue, organ, and whole plant), plus the classes gametophyte, sporophyte, and in vitro cultured cell, tissue and organ. In the current version, the terms gametophyte (PO:0009004) and sporophyte (PO:0009003) were subsumed within whole plant. This was achieved by making gametophyte and sporophyte obsolete, with a suggestion to consider whole plant as a replacement term. A new top-level term was added to the Plant Growth and Developmental Stage Ontology: plant life cycle phase (PO:0028001). This term has two new children: sporophytic phase (PO:0028002) and gametophytic phase (PO:0028003). For future releases, plant structures that are specific to the gametophytic or sporophytic phase will have a participates_in relationship to the appropriate phase. The terms whole plant (PO:0000003) and plant cell (PO:0009002) were retained. Organ, tissue, and in vitro cultured cell, tissue and organ were also retained, but renamed plant organ (PO:0009008), portion of plant tissue (PO:0009007), and in vitro plant structure (PO: 0000004), respectively. Two new top-level terms were added: cardinal organ part (PO:0025001) and collective plant structure (PO:0025007). Although these new terms are not traditional in plant anatomy, they were necessary as grouping terms, because there are many commonly-referenced plant structures that are not a cell, organ, or tissue (e.g., petiole, lamina, flower, corolla).

In vitro plant structures are still considered a class of plant structure, but should separate from plant structures that occur in vivo. Therefor, in vitro structures may have a derives_from relationship to other plant structures, but not an is_a or part_of relationship.

Discussions are underway to have all plant cell terms mirrored in the Cell Ontology (CL) and all in vitro plant structure terms mirrored in the Ontology for Biomedical Investigation (OBI).

Rationale for inflorescence and fruit terms

The current structure of GO requires that each term used be unique. Because of the many terms for fruits and inflorescences, we found that when we tried to include even a carefully edited list of inflorescence and fruit terms in the Plant Ontology, terms proliferated rapidly. For example, if we listed cyme, panicle, and raceme as instances of inflorescence, then we needed to create three additional terms, flower of cyme, flower of panicle, and flower of raceme as parts of each inflorescence type. Because in all cases, the term flower has its own children (androecium, gynoecium, petals, stamens), special terms then had to be created for each of these (androceium of flower of cyme, androecium of flower of raceme, etc.). In other words, each use of a generic term must be specified as a particular instance such that the hierarchy above it is embedded in the term itself. This process of carrying the hierarchy into the terms themselves then propagates downward, leading potentially to terms such as "microsporangium of theca of anther of androecium of flower of cyme".

To mitigate this problem, we decided to make use of synonymy as much as possible. Thus the terms "cyme," "raceme," "panicle," "spike" all become synonyms of "inflorescence," and fruit types all become synonyms of "fruit", effectively removing one hierarchical level from the ontology. This will still allow the user to find genes that affect cymes, because a search on "cyme" will pull up all inflorescence genes. For cross-species comparisons, generic searches, and coarsely annotated genes, the synonymy will be helpful. In addition, we have deliberately limited our list of inflorescence (and fruit?) terms to the grasses, Arabidopsis, and tomato, ignoring all other plants for the time being.

Synonymy, however, loses information for more detailed searches. Using the PO alone, for example, it would be impossible to find genes expresses only in cymes, or only in spikes. For the taxa currently incorporated into the PO, specificity can be achieved at the moment using a taxonomic filter, which is available in the current Gene Ontology AmiGO browser. Genes from "spikes sensu Triticeae" could be found by searching only among Triticeae genes, genes from "panicles" by searching rice, genes from "racemes" by searching Arabidopsis. The only current genus for which a taxonomic filter will not work is Zea, which has physically separate and morphologically distinct inflorescences. The two sorts of inflorescence often have different phenotypes in single-gene mutants, and identical genes are often deployed differently in each. Maize geneticists thus often want to be able to distinguish these two. Therefore, the maize ear and tassel are the only two inflorescence types that are treated as instances of "inflorescence". This permits annotation of genes and phenotypes that differ between the two inflorescence types.


  

Last modified: Fri Mar 16 16:36:17 2012


 | Feedback/Contact Us | Copyright Statement