Schemata for structural formulae (SF) such as C n H2n.
Chemical classes can also be defined based on where the chemical came from in synthetic or natural pathways. Chemicals of natural metabolic origin are called natural products. As our ability to determine molecular structure by such methods as crystallography, NMR, CASE has improved over the past century, so too has our ability to describe what is in a particular structural class. For example the klymollins , extracted from the coral Klyxum molle, are all produced by reactions from a common core molecule and have very similar connectivities and compositions. This is a common pattern for recently-discovered natural product molecules. Contrast this with alkaloids, one of the earliest classes of natural products to be identified, for which the best formal definition we have for the class reads (from ChEBI) 'Any of basic nitrogen compounds (mostly heterocyclic) occurring mostly in the plant kingdom (but not excluding those of animal origin). Amino acids, peptides, proteins, nucleotides, nucleic acids, amino sugars and antibiotics are not normally regarded as alkaloids. By extension, certain neutral compounds biogenetically related to basic alkaloids are included.' A flexible and expressive language is needed to fully do justice to the wide range of class names that are intuitive to chemists and can be found in natural language in electronic lab notebooks (such as are used in industry) and indeed in more traditional scientific publications.
Jon Soderholm, Mike Uehara-Bingen, Karsten Weis & Rebecca Heald
Many interesting classes of chemicals are defined based on what the chemical does (its function or activity) in a biological or chemical context. Included in this group are drug usage classes such as antidepressant and antifungal; chemical reactivity classes such as solvent, acid and base; and biological activities such as hormone . These are included in ChEBI under the 'role' ontology.
Interesting classes in chemistry can be grouped into those which are structure-based and those which are not. Structure-based classes are defined based on the presence of some shared structural feature across all members of the class. This feature, however, may be crisply defined or vaguely defined. Crisply defined structural classes will form the focus in this paper, and are discussed further in the section sec:resultsclasses below. Vaguely defined structural classes, by comparison, are those based on a family resemblance between a group of molecules, that are often of natural origin or have biological relevance. For example, steroids are defined in ChEBI as 'Any of naturally occurring compounds and synthetic analogues, based on the cyclopenta[a]phenanthrene carbon skeleton, partially or completely hydrogenated; there are usually methyl groups at C-10 and C-13, and often an alkyl group at C-17. By extension, one or more bond scissions, ring expansions and/or ring contractions of the skeleton may have occurred.' The vagueness is indicated by terms and phrases such as 'usually', 'one or more' and 'may have'. The approaches to chemical class definition that we will discuss in this paper are not able to represent such vagueness, although extensions such as fuzzy logic or logic enhanced with probability constraints may in the future be able to support this use case.
A methyl group on a pyridine skeleton
A further benefit of a formalisation of class definitions is that this would allow disambiguation of different class definitions that are used by different communities in reference to the same entities. For example, some communities may use the term 'hydrocarbons' as encompassing derivatives such as chlorohydrocarbons, while other communities may use the term in a stricter sense. The use of different definitions for the same class may lead to different chemical hierarchies as produced by classification tools implementing the same algorithms (structure-based and/or logic-based). Standardisation of class definitions across disparate communities requires communication between cheminformaticians/logicians and chemists. Formalisation of class definitions in support of automatic classification allows explicit disambiguation of these different senses; this can be achieved through convergence on a community-wide shared ontology which assigns different labels to classes that are defined differently, but which provides both of the disputed versions of the definition, thus allowing different user communities of user to select their preferred version.
Paul V Murphy & Peter J Rutledge
Logic lies at the heart of modern knowledge representation (KR) technologies. Logic-based representation employs formal methods developed in the context of mathematical logic in order to encode knowledge about the world. The key advantage of these methods is that the knowledge is stored in a machine-processable form. A core feature that the vast majority of KR formalisms share is the use of a well-defined syntax and semantics. The syntax serves as the alphabet of the language: it provides a set of symbols and a set of rules that regulate the arrangement of the symbols in valid expressions. The semantics enriches the syntactic objects with a meaning so that expressions complying with certain syntactic forms, known as axioms, have a universal and predefined interpretation. It is their semantics that enables machine processing. A set of valid syntactic expressions, known as axioms, constitutes an ontology in the computer science sense.
A group or atom present in any position within the molecule
The amenability of KR languages to automated reasoning is of crucial importance. A reasoning algorithm - relying on principles of logical deduction - detects possible inconsistencies and computes the inferences that follow from a set of formally defined axioms; note that a reasoning algorithm is tied uniquely to the specific syntax and semantics of the given KR language. A reasoning engine can be used to check the logical consistency of a set of logical axioms. For instance, if a knowledge base (i) defines organic and inorganic compounds as disjoint chemical classes (ii) contains the fact that cobalamin is an organic compound and (iii) also classifies cobalamin as inorganic, then a contradiction will be detected. Another standard reasoning task is the discovery of information that is not explicitly stated in the ontology. For example, if an ontology categorises cobalamin as a B vitamin and also asserts that B vitamins participate in cell metabolism, then the fact that cobalamin participates in cell metabolism is derived. The automation of the above tasks - traditionally performed by humans - has a clear advantage as it permits the allocation of research resources to more intellectually demanding activities.