Over the years, we had the pleasure to collaborate with a number of superb human beings. Any list presented here would necessarily be incomplete, so we are listing a number of projects which should cover most scientific, infrastructural and political themes that drive our collaborations. In alphabetical order:
- The Blue Obelisk Movement (http://www.blueobelisk.net)
- The ChEBI database project (http://www.ebi.ac.uk/chebi)
- The Chemistry Development Kit Project (http://cdk.github.io)
- ChemBioSys (http://www.chembiosys.de/)
- Coordination of Standards in Metabolomics, COSMOS, (http://www.cosmos-fp7.eu)
- The MetaboLights database project (http://www.metabolights.org)
- PhenoMeNal, cloud computing with big metabolomics data (http://phenomenal-h2020.eu)
In the following, we outline further involvements in international collaborative projects:
Computer-Assisted Structure Elucidation
With Prof. Emma Schymanski at the University of Luxemburg and our joint PhD student Adelene Lai, we are investigating cheminformatics approaches to identify unknowns in mixtures and biological systems.
Standards
NMReData Initative
We are a member of the NMReData Initiative supporting a light-weight data format to report NMR data, structures and assignments for small molecules in articles.
Ontologies
We are actively involved in several aspects of the development, adoption and dissemination of ontologies as standards for the annotation of life-science data. Ontologies are structured controlled vocabularies that have several features that make them ideal for the standardisation of annotations, including hierarchical organisation for flexible aggregation, semantics-free stable identifiers, and a plug-in architecture without dependence on a fixed database schema. We developed the ChEBI database and ontology for chemical entities of biological interest [1]. ChEBI is the chemical ontology of choice for many life science data annotation projects, and has been adopted by the OBO Foundry as the reference ontology for chemical entities. ChEBI is also used by the Gene Ontology to identify chemicals in chemical-involving processes and functions.
We have developed the CHEMINF ontology for chemical information entities [2], such as descriptors, algorithms and toolkits, for use in providing provenance and disambiguation for the properties of chemical entities being made available as open data in the context of the in the Semantic Web.
Metabolomics
Our group led the COSMOS effort, Coordination of Standards in MetabOlomicS [3], aiming to drive forward the definition and adoption of standards for data exchange and annotation in the field of metabolomics. Metabolomics is an important phenotyping technique for molecular biology and medicine. It assesses the molecular state of an organism or collections of organisms through the comprehensive quantitative and qualitative analysis of all small molecules in cells, tissues, and body fluids. Metabolic processes are at the core of physiology. Consequently, metabolomics is ideally suited as a medical tool to characterise disease states in organisms, as a tool for the assessment of organisms for their suitability in, for example, renewable energy production or for biotechnological applications in general.
We are now seeing the emergence of metabolomics databases and repositories in various subareas of metabolomics and the emergence of large general e-infrastructures in the life sciences. In particular, the BioMedBridges project is set to link a variety of European Strategy Forum on Research Infrastructures (ESFRI) projects, such as ELIXIR and BBMRI. Metabolomics generates large and diverse sets of analytical data and therefore impose significant challenges for the above mentioned e-infrastructures. The COSMOS effort is designed to develop standards and policies to ensure that metabolomics data are:
- Encoded in open standards to allow barrier-free and wide-spread analysis.
- Tagged with a community-agreed, complete set of metadata (minimum information standard).
- Supported by a communally developed set of open source data management and capturing tools.
- Disseminated in open-access databases adhering to the above standards.
- Supported by vendors and publishers, who require deposition upon publication
- Properly interfaced with data in other biomedical and life-science e-infrastructures (such as ELIXIR, BioMedBridges, EU-Openscreen).
COSMOS brought together leading European groups in Metabolomics and interfaced with all interested players in the Metabolomics and beyond, world-wide.
Standards in Chemical Biology
Our group was a partner in the EU-OPENSCREEN effort, the European Infrastructure of Open Screening Platforms for Chemical Biology, which aims to integrate high-throughput screening platforms, chemical libraries, chemical resources for hit discovery and optimisation, bio- and cheminformatics support, and a database containing screening results, assay protocols, and chemical information. We led the Standardisation work package, tasked with defining a core set of representational and transfer data standards for open data sharing and reproducible analysis in European chemical biology. As a part of this effort we are collaborating closely with the PubChem team for chemical data standardisation and the BioAssay Ontology team for biological assay description standardisation. We have also contributed to the development of the Minimum Information to Annotate a Bioactive Entity (MIABE) project.
Standards for Chemical Markup — CML and CMLSpect
Chemical Markup Language (CML) is an XML language designed to facilitate the creation, interchange, and deposition of chemical information. CML covers many areas of mainstream chemistry including:
- Molecules – structures and properties
- Reactions, including properties and reaction schemes
- Spectra, especially as found in chemical publications (CMLSpect)
crystallography, especially the interplay of structure and chemistry - computational chemistry
- The Steinbeck group has been closely involved in the development of CML, especially CMLSpect. CMLSpect is heavily used in Bioclipse to handle spectral information.
References
- (2015): ChEBI in 2016: Improved services and an expanding collection of metabolites.. In: Nucleic Acids Research, vol. 44, no. D1, pp. gkv1031–D1219, 2015.
- (2011): The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web.. In: PLoS ONE, vol. 6, no. 10, pp. e25513, 2011.
- (2015): COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access. In: Metabolomics, vol. 11, no. 6, pp. 1–11, 2015.