Introduction

Preface

In the past two decades, technological developments have made it feasible to generate large volumes of heterogeneous biological data. Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding and mining these large-scale and complex biological data. A major challenge in bioinformatics is the integration of data from different sources. Several efforts have been made to aid in this process, including: standard metadata annotation formats to share and describe the experiments (ISA-Tab), minimal information guidelines to report biological and biomedical science (MIBBI project) and biological and biomedical ontology-based controlled vocabulary (OBO Foundry and Bioontology). However, the bioinformatics community still struggles to bring these standards, guidelines, and ontologies into the wet laboratory, the workbench where the biological experiments are carried out.

Why Annot?

Wet lab scientists use their lab books or traditional laboratory information management systems to keep track of the samples, reagents, assay workflow, provenance of results, and to record data. These interactions are usually focused on human beings, not machines. Consequently, as every bioinformatician knows, when sophisticated data analysis or data integration should be done, the data usually have to be “scrubbed” to be usable by computers. Unfortunately, this process is time-consuming and error-prone.

A major part of the problem is the lack adherence to a controlled vocabulary to annotate data of biological experiments. Even though annotating experiments with a controlled vocabulary would reduce common data integration errors, constant compliance is difficult to achieve using standard operating procedures.

Here we introduce Annot, a web application to bridge the gap between wet lab scientists and data analysts. Annot is a web application to captures the metadata and link it to raw data, and processed data from biological studies in standardized computational formats so that they are ready for analysis and sharing.

Implementation

The foundation of Annot lies in the controlled vocabulary. Whenever possible, we derive controlled vocabulary from established ontologies that can be easily updated to the latest ontology version and extended to include missing terms. Internal terms then never interfere with official ontology terms. Each controlled vocabulary is handled as a django application.

Wet lab scientists had input on the structure of the assay building bricks (symbolized by the jigsaw pieces). Whenever possible, we adhered to the actual LINCs metadata standard and ISA-Tab specification.

This assay building bricks can be bridged to connect samples and reagents to protocols, executing person and execution date.

Through implementation assay workflows can be connected by the bridged assay building bricks analogous to a Unix pipeline, to track data provenance. This Lego-like system makes it easy to keep the workflow up-to-date with actual assay development in the wet lab or add new assay workflows.

Assays can be connected to studies, studies can be connected to implementations.

Annot can be shaped to the particular need of a lab by choice or implementation of sample types, reagtent types and assay types (symbolized by the three dots) which might dictate new controled vocabulary to be used.

Overall, the system’s goal is to be modular, adaptable, lean and agile but focused on the major task: capturing biological study metadata and data annotations so that data are ready for analysis and sharing.

_images/20151123architecture.png

Figure: Blueprint for the Annot architecture v0.3. The Annot Django stack consist bottum up of: tool layer (maroon), ontology layer (green), brick layer (cyan), bridge layer (blue), arch layer (navy and purple) and the sysadmin layer (yellow) which keep all other layers together. The colors are identical with the grahical user interface coloring. The archidectur inside this layers is modular. Each tool, vocabulary, sample type, reganet type and assay type as well as protocol, publication, person, study and investigation are handled by a separate django application. Each module depends only on modules form the layers below or from the same layer, never on on modules form the layer above. Consequently, by enabeling only lower stack layers, Annot can be run just for controlled vocabulary maintenace (ontology layer) or sample and reagent tracking with out the complexity of assay data provenace tracking (ontology and brick layer). It is always possible to enable higher layer when needed.

Input and Output

At the ontology level keeps Annot automaticaly track of new releases form the plugged in controlled vocabulary modules. At the brick level data input and output via tab separated value spreadsheet is possible, as Excel one of the standatd tools of any scientist. However, standard input output format is structured text in Json format.

In brief, if you never heard about Json:

  • Json is a machine and human readable structured data exchange text format.
  • Json handles particularely one to many realtionships better then spredsheets.
  • Json is not super powerfull (e.g. can not handle complex numbers), but powerfull enough for our case.
  • A json library for your favourite computer language will most probably allready exist.

Source Code

The source code is distributed under the free and open source AGPLv3 license through https://gitlab.com/biotransistor/annot .

Annot is written in Python > 3.4, utilizing the Django > 1.8 web framework with PostgreSQL > 9.4 and Nginx > 1.9. Annot is to deployed with the Docker > 1.6 distribution platform. Docker implementation is based on the official Python, Postgres, Nginx and Debian > 8 Jessie images.

Early development and deployment evolved by running Apache 2.4 on a FreeBSD 10 operating system in a VirtualBox 4.3 machine.

Project References

PyCon 2015 Montreal: https://us.pycon.org/2015/schedule/presentation/461/