Tutorial

This tutorial will step-by-step guide you through the process of

  1. populating annot with backed up controlled vocabulary and update it with the latest ontology version available online.
  2. populating annot with backed up sample and reagent bricks.
  3. populating annot with backed up study and investigation information.
  4. populating annot with backed up experiment layouts and layout one acjson file yourself.
  5. populating annot with backed up tracking information.
  6. backup the work done.

Preparation


  1. Before you follow this tutorial you have to install the development version of annot as described in HowTo install annot.
  2. run git clone https://gitlab.com/biotransistor/annotTutorial.git to clone the tutorial material to you machine.
  3. run cp -r annotTutorial annot/web/ to copy the cloned annotTutorial folder into the annot/web/ folder from your annot installation.
  4. run rm -fr annot/web/annotTutorial/.git to remove the .git folder in the copied annotTutorial folder.

Controlled Vocabulary


  1. Enter annot
    1. docker-machine ls lists all the installed docker machines.
    2. docker-machine start an0 starts the an0 docker machine, if needed.
    3. eval $(docker-machine env an0) loads the an0 environment variables.
  2. run docker exec -ti annot_webdev_1 /bin/bash enter the annot_webdev_1 docker container
    1. ls should list among others the annotTutorial folder.
  3. Load the backed up vocabulary.
    1. cp annotTutorial/vocabulary_json/* ../media/vocabulary/backup/ copies the backed up vocabularies to the right place inside annot.
    2. python manage.py vocabulary_loadbackup will populate each vocabulary app first with the latest backup found at /usr/src/media/vocabulary/backup/`, then, it will download the latest ontology version, if the online version is newer than the version already in the database, and update the database content with it.

If you get a urllib.error.HTTPError: HTTP Error 401: Unauthorized error, then your APIKEY_BIOONTOLOGY credential inside annot/web/prjannot/crowbar.py will most probably be wrong.

Now, let’s find out with which ontologies and version annot’s vocabularies were populated.

  1. point your browser to http://192.168.99.100/admin/ and login with your credentials.
  2. click the red colored Sys admin ctrl vocabularies link. A table should pop up which lists all vocabularies and the information we were interested in.

Bricks


  1. python manage.py loaddata annotTutorial/brick_tsv/person_brick_20180331_235444_oo.json loads the person brick into the database. The person brick is special kind of brick that doesn’t annotate data. The person brick is used to annotate the responsible person for each sample and reagent bricks. This is the reason that it is re-loaded a bit differently than the other bricks.

  2. let’s have a look at the upload person bricks.

    1. point your browser to http://192.168.99.100/admin/ and login with your credentials.
    2. click the yellow colored Staff link. A table should pop up, displaying the uploaded bricks.
  3. python manage.py antibody1_tsv2db annotTutorial/brick_tsv/antibody1_brick_20181024_003732_human.txt load the primary antibody bricks.

    1. let’s have a look at the upload primary antibody bricks. Click the orange colored Endpoint Primary Antibody link to retrieve the database table.
  4. python manage.py antibody2_tsv2db annotTutorial/brick_tsv/antibody2_brick_20180510_020008_human.txt loads the secondary antibody bricks.

  5. python manage.py cstain_tsv2db annotTutorial/brick_tsv/cstain_brick_20180503_020012_human.txt loads the compound stain bricks.

  6. python manage.py compound_tsv2db annotTutorial/brick_tsv/compound_brick_20180511_020009_human.txt loads the compound bricks.

  7. python manage.py proteinset_tsv2db annotTutorial/brick_tsv/proteinset_brick_20180502_020026_human.txt loads the protein complex bricks.

  8. python manage.py protein_tsv2db annotTutorial/brick_tsv/protein_brick_20180502_020024_human.txt loads the protein bricks.

  9. The sample bricks are a bit special because of the sample_parent field. This means sample bricks might relate to other sample bricks. Because of that we first have to manually generate the not_available-notavailable_notavailable_notavailable and not_yet_specified-notyetspecified_notyetspecified_notyetspecified sample before we can upload the sample bricks with the usual command. For the same reason, we might have to call the human_tsv2db command several times till all sample bricks are loaded into the database.

    1. click the orange colored Sample Homo Sapiens Add Human Brick link.
    2. click Save. An error message: “Please correct the errors below.” will appear.
    3. Scroll down. For every “This field is required.” error choose not_yet_specified.
    4. when you reached the bottom click Save. This should bring you back to the database table. sample not_yet_specified-notyetspecified_notyetspecified_notyetspecified should now exist.
    5. click not_yet_specified-notyetspecified_notyetspecified_notyetspecified
    6. change Sample, Provider, Provider_catalog_id, Provider_batch_id to not_available.
    7. click Save. Sample not_available-notavailable_notavailable_notavailable should now exist too.
    8. now run command python manage.py human_tsv2db annotTutorial/brick_tsv/human_brick_20180615_020026_human.txt, if needed several time, until all sample bricks are loaded.

    Now, all backed up bricks should be loaded.

  10. Let’s have a look how this brick information can be downloaded. All brick information can primary be downloaded in json and tsv format.

    1. click the orange colored Perturbation_Protein link.
    2. In the Action drop down list choose Download protein page as as json file and click Go.
    3. In the Action drop down list choose Download protein page as as tsv file and click Go.

This will download all protein bricks stored in the database once in json and once in tsv format. The download procedure is the same for any brick type (primary antibody, secondary antibody, compound stain, compound, protein, proteinset and homo sapiens sample).

Additionally, you can download related bricks form the investigation, study, assay run and acaxis layer. From there the bricks are downloadable in annot’s json and tsv standards and additionally in the lincs data standard. Please note: The lincs data standard contains not all stored brick information. Only the fields compatible with the lincs standard. Further, are the bricks re-grouped into an antibody, a small molecules, a protein and a cell line file, this as well because of the lincs standard.

Upload the Bricks to make them accessible in the Experiment Layout Layer!

Before any brick is accessible for experiment layout, it must be uploaded into the corresponding Uploaded enpoint reagent bricks, Uploaded perturbation reagent bricks or Uploaded sample bricks table. The first time after you install annot you have to do this via command line, because the database tables which the GUI relies on have to be initialized. After that you can populate the brick tables via command line or GUI.

On the command line:

  1. python manage.py brick_load will upload all bricks.

On the GUI:

  1. scroll down to the red colored Appsabrick block.
  2. click the Sys admin bricks link. You will find a table with all the brick types and some information about their bricking. This is the place were you from now on can brick bricks by GUI.
    1. choose the brick types you like to brick by clicking the box in front of it.
    2. in the Action drop down list choose Upload brick and click Go.
  3. go back to Home Appsabrick
  4. click the Uploaded endpoint reagent bricks or Uploaded perturbation reagent bricks or Uploaded sample bricks. This are the tables containing the uploaded bricks. Those are the bricks accessible for layout.

Investigations and Studies


Let’s load from the backup what was stored in the Investigation and Study table under the black colored links.

  1. run docker exec -ti annot_webdev_1 /bin/bash to enter annot by command line.
  2. run python manage.py loaddata annotTutorial/experiment_json/investigation_investigation_20181024_oo.json
  3. run python manage.py loaddata annotTutorial/experiment_json/study_study_20181024_oo.json
  4. click on the black colored Investigation link to see the reloaded content.
  5. click on the black colored Study link to see the reloaded content.

Experiment Layouts


Annot provides you with a super flexible way to layout any biological experiment. In a first step the three major axis from each biological experiment - sample, perturbation, endpoint - are laid out on the Acaxis layer. Then, these axes are pulled together on the Assay run layer.

For example, let’s layout a lapatinib perturbation:

  1. click the cyan Set_of_Perturbation Add_perturbation_set link.
  2. enter the Setname: ds-lapatinib_750nM_v1
  3. click inside the button next to Brick: this is a searchable drop down list. type lapatinib into the button. the list will immediately filter as you type. click on compound-lapatinib_chebi49603-SelleckOwn_S1028_notavailable. If you can not find lapatinib in the list because the list is still empty, remember that you first have to upload the bricks so that they become accessible. Do so. Then the brick should appear in the drop down list. In this example we will only layout a lapatinib perturbation. If you would have to layout a 384 well plate with dozens of different perturbations keep on selecting all the reagent bricks you need.
  4. click Save

Now let’s layout the plate design. For this we will generate and edit the acjson template script code.

  1. click the cyan Set_of_Perturbation Add_perturbation_set link.
  2. click in the box in front of the ds-lapatinib750nmv1 row. The box will turn blue and get a tick.
  3. in the Action drop down list choose Download selected set's as python3 acpipe template script and click Go.
  4. open the downloaded acpipeTemplateCode_ds-lapatinib_750nM_v1.py file in a plain text editor

Let’s have a look a the generated template file in detail.

The first part is the so called header.

###
# title:  acpipeTemplateCode_ds-lapatinib_750nM_v1.py
#
# date: 2018-04-02
# license: GPL>=3
# author: bue
#
# description:
#   acpipe script to generate a perturbationset related acjson file.
#   template automatically generated by annot softawre.
#   check out: https://gitlab.com/biotransistor/annot
###

The header gives some basic information about the code in a comment section defined by hashtags (#).

The second part loads the libraries needed to interpret the program beside the python3 standard library. Those libraries are copy, json and acpipe_acjson.

# python library
import copy
import json

# acpipe library
# check out: https://gitlab.com/biotransistor/acpipe_acjson
import acpipe_acjson.acjson as ac

The third part builds an acjson object. Acjson stands for assay coordinate java script object notation. Acjson is the format we will use to layout the entire experiment. The acjson format is based on and totally compatible with the json syntax.

# build acjson
d_acjson = ac.acbuild(
    s_runid='ds-lapatinib_750nM_v1',
    s_runtype='annot_acaxcis',
    s_welllayout='?x?',
    ti_spotlayout=(1,),
    s_log='from the bricks'
)

Notably, in the template code, annot was able to specify s_runid, s_runtype and s_log. The question marks (?) in the code reflect that the wellplate layout has not yet been specified. You could for example specify a 384 wellplate (16x24) or a 96 wellplate (8x12) or petri dish (1x1) or a tube (1x1). We set s_welllayout to 1x1 because we will treat all our wells with the same lapatinib concentration. Annot can handle arrays with multiple spots per well. However, we will leave ti_spotlayout by the default which is (1,), since lapatinib not is spotted to the array.

The fourth part describes the experimental layout.

# reagent: compound-lapatinib_chebi49603-SelleckOwn_S1028_notavailable
s_gent = 'lapatinib_chebi49603'
d_record = {s_gent: copy.deepcopy(ac.d_RECORDLONG)}
d_record[s_gent].update(copy.deepcopy(ac.d_SET))
d_record[s_gent].update(copy.deepcopy(ac.d_EXTERNALID))
d_record[s_gent].update({'manufacture': 'Selleck_Own'})
d_record[s_gent].update({'catalogNu': 'S1028'})
d_record[s_gent].update({'batch': 'not_available'})
d_record[s_gent].update({'conc': ?})
d_record[s_gent].update({'concUnit': 'nmolar_uo0000065'})
d_record[s_gent].update({'cultType': '?'})
d_record[s_gent].update({'timeBegin': ?})
d_record[s_gent].update({'timeEnd': ?})
d_record[s_gent].update({'timeUnit': 'hour_uo0000032'})
d_record[s_gent].update({'recordSet': ['ds-lapatinib_750nM_v1']})
d_record[s_gent].update({'externalId': 'LSM-1051'})
d_acjson = ac.acfuseaxisrecord(
    d_acjson,
    s_coor='?',
    s_axis='perturbation',
    d_record=d_record
)

As you con see, annot pulled as much information it could from the setname (recordSet) and the bricks (manufacture, catalogNu, batch, concUnit, timeUnit, externalId). However, we have not yet specified the actual concentration and time and how lapatinib will be provided to the cells (e.g. batch or fed or contiuose). Please set conc to 750 nmol, timeBegin to 0 hours, timeEnd to 72 hours and cultType to batch. Now, as we said this is a petri dish. Natural the coor coordinate will be '1'. Notice that the coordinate, even it is an integer, is given as string. This is because coor is a json dictionary key, and json dictionary keys, have to be strings to be compatible with the json syntax.

In the acjson format the coordinate numbering have to starting always by ‘1’ and increases by 1, starting at the upper left corner, track every spot inside a well, well by well, from left to right, from top to bottom.

In the last part the acjson object is written into a json file.

# write to json file
print('write file: {}'.format(d_acjson['acid']))
with open(d_acjson['acid'], 'w') as f_acjson:
    json.dump(d_acjson, f_acjson, sort_keys=True)

Now lets generate and upload the acjson file.

This is python3 code. You will have to install python3 on your computer and an additional python3 library called acpipe_acjson to run this code.

  1. How to install python3 depends very much on the operating system you are running. Install per your system.

  2. After you have installed python3, you can install acpipe_acjoson by running pip3 install acpipe_acjson from the command line.

  3. run python3 acpipeTemplateCode_ds-lapatinib_750nM_v1.py to run the modified template code from the command line. The resulting file’s name annot_acaxis-ds-lapatinib_750nM_v1_ac.json is constructed out of runtype annot_acaxis and runid ds-lapatinib_750nM_v1. To study the json file open it in a webbrowser or text editor. You can change the last line of the code from json.dump(d_acjson, f_acjson, sort_keys=True) to json.dump(d_acjson, f_acjson, indent=4, sort_keys=True) to get better human readable output. However, the resulting file will as well take more disk space then a file without indent. The resulting acjson file will look like this:

    {
        "1": {
            "endpoint": null,
            "iSpot": 1,
            "iWell": 1,
            "iixii": "1_1x1_1",
            "ixi": "1x1",
            "perturbation": {
                "lapatinib_chebi49603": {
                    "batch": "not_available",
                    "catalogNu": "S1028",
                    "conc": 750,
                    "concUnit": "nmolar_uo0000065",
                    "cultType": "batch",
                    "externalId": "LSM-1051",
                    "manufacture": "Selleck_Own",
                    "recordSet": ["ds-lapatinib_750nM_v1"],
                    "timeBegin": 18,
                    "timeEnd": 90,
                    "timeUnit": "hour_uo0000032"
                }
            },
            "sample": null,
            "sxi": "Ax1"
        },
        "acid": "annot_acaxis-ds-lapatinib_750nM_v1_ac.json",
        "log": "from the bricks",
        "runid": "ds-lapatinib_750nM_v1",
        "runtype": "annot_acaxis",
        "spotlayout": [
            1
        ],
        "welllayout": "1x1"
    }
    
  4. to upload the generated acjson file

    1. click the cyan Set_of_Perturbation link.
    2. then click the ds-lapatinib750nmv1 link.
    3. then click the Browse… button.
    4. search and choose the annot_acaxis-ds-lapatinib_750nM_v1_ac.json file and click Open.
    5. then click Save.
    6. in the Acjson file column should now appear a link upload/acaxis/annot_acaxis-ds-lapatinib_750nM_v1_ac.json. click this link.
    7. the uploaded json file should open in the browser. Use your browser’s back arrow to go back to the perturbation set table.
    8. optional: install a json viewer addon in your browser, as described in HowTo json files and youe web browser. Then click again the upload/acaxis/annot_acaxis-ds-lapatinib_750nM_v1_ac.json link in the Acjson file column. The file should now appear nicely rendered.
  5. now let’s check the uploaded acjson against the brick content it was generated from.

    1. click in the box in front of the ds-lapatinib750nmv1 row. The box will turn blue and get a tick.
    2. in the Action drop down list choose Select selected set's acjson file against brick content. You should receive a message “Ds-lapatinib750nmv1 # sucessfull checked.”. Or a “Warning” or “Error” when something for annot not totally is as expected with the uploaded acjson file.

When everything worked as expected, then it is now time to store the modified python3 template code in your own source code repository. Just in case you later have to modify the layout and regenerate the acjson. Follow the instruction described in HowTo backup acpipeTemplateCode_.py* .

As a reference, all modified template scripts to generate all acjson used in this tutorial can be found inside the same folder as the acjson files. Have for example a look at annotTutorial/experiment_acjson/acaxis/acpipeTemplateCode_es-1layout2v3.py. This script layouts a 384 wellplate. Or have a look at annotTutorial/experiment_acjson/runset/acpipeTemplateCode_mema-LI8C00201.py. This is the script to generate the assay run acjson out of sample, perturbation, endpoint and superset acjsons, for mema assay LI8C00201.

The rest of the acjson used in this tutorial we will now restore from the backup.

  1. run docker exec -ti annot_webdev_1 /bin/bash to enter annot by command line.
  2. run cp -r annotTutorial/experiment_acjson/* ../media/upload/ to copy the acjson file at the expected place.
  3. run python manage.py loaddata annotTutorial/experiment_json/acaxis_sampleset_20181024_oo.json
  4. run python manage.py loaddata annotTutorial/experiment_json/acaxis_perturbationset_20181024_oo.json
  5. run python manage.py loaddata annotTutorial/experiment_json/acaxis_endpointset_20181024_oo.json
  6. run python manage.py loaddata annotTutorial/experiment_json/superset_acpipe_20181024_oo.json
  7. run python manage.py loaddata annotTutorial/experiment_json/superset_supersetfile_20181024_oo.json
  8. run python manage.py loaddata annotTutorial/experiment_json/superset_superset_20181024_oo.json
  9. run python manage.py loaddata annotTutorial/experiment_json/runset_runset_20181024_oo.json to load back the table content.

Now let’s have a look what we can do with this acjson files.

  1. The Acjson_to_tsv_setting. The acjson file stores information about each sample and reagent that was used in the screen. Now, in practice, when we download layout files or dataframes - which are generated straight out of the acjson files - we might not always be interested in all the details stored in the acjson files. We can tweak such downloads in the Acjson to tsv setting.

    1. click the cyan Acjson_to_tsv_setting Add_acjson_to_tsv link.
    2. click Save it should now have generated an entry for the user you are now logged in.
    3. click on your username.
    4. You will see a list of all informative fields stored in an acjson. By clicking the boxes you can choose which one of those you like to write out when you download a tsv_long_layout file, a tsv_tidy_dataframes, or a tsv_unstacked_dataframe.
  2. Download acjsons and layouts. We described acjson files, how to generate and upload them, in the section above. Layout files are similar to Excel sheets that are commonly used for experimental layout description.

    1. click on the dark blue Assay_Runs link.
    2. click in the box in front of the mema-LI8C00201 row.
    3. in the Action drop down list choose Download select selected sets as acjson file. This is the acjson file we before uploaded.
    4. in the Action drop down list choose Download select selected sets as tsv_short layout file. This will generate a bunch of layout files, one for each major setname, with no other content then the basic sample or reagent names.
    5. in the Action drop down list choose Download select selected sets as tsv_long layout file. This will generate a bunch of layout file with the reagent names and all content chosen in the Acjson_to_tsv_setting settings.
  3. Downloading dataframes. Dataframes are tsv files in a format easy to upload into pandas or R

    1. click on the cyan Set_of_Perturbation link
    2. click in the box in front of the es-1layout2v3 row. This is a simple 384 well plate layout.
    3. in the Action drop down list choose Download select selected sets as tidy dataframes and click Go.
    4. in the Action drop down list choose Download select selected sets as unstacked dataframe and click Go.

    For a simple 384 well screen, using the browser GUI can generate a dataframe in a reasonable amount of time. For larger screens the fastest and easiest way to generate dataframes is to download the acjson file and generate the dataframe locally.

    1. click on the dark blue Asssay_Runs link
    2. click in the box in front of the mema-LI8C00201 row.
    3. in the Action drop down list choose Download selected sets as acjson file. This will download an acjson file named annot_runset-mema-LI8C00201_ac.json
    4. move the annot_runset-mema-LI8C00201_ac.json into your woking directory.
    5. in your work directory run python3 to start a python shell.

    In the python3 shell type:

    1. import json loads the json library.
    2. import acpipe_acjson.acjson as ac loads the acjson library.
    3. f_acjson = open("annot_runset-mema-LI8C00201_ac.json") opens the file handle to the acjson file.
    4. d_acjson = json.load(f_acjson) loads the acjson file into a kind of a complicated python dictionary termed acjson object.
    5. ac.acjson2dataframetsv(d_acjson, s_mode="tidy") will generate three tidy stacked dataframe files named annot_runset-mema-LI8C00201dataframetsv_tidy_sample.tsv, annot_runset-mema-LI8C00201dataframetsv_tidy_perturbation.tsv, and annot_runset-mema-LI8C00201dataframetsv_tidy_endpoint.tsv
    6. ac.acjson2dataframetsv(d_acjson, s_mode="unstacked") will generate an unstacked dataframe file in the working directory named annot_runset-mema-LI8C00201dataframetsv_unstacked_spe.tsv

Assay and Superset generation Tracking


Let’s reload from the backup what was stored in the purple colored Mema assay tracking and Superset tracking tables.

  1. run docker exec -ti annot_webdev_1 /bin/bash to enter annot by command line.
  2. run python manage.py loaddata annotTutorial/experiment_json/track_memasuperset_20181024_oo.json
  3. run python manage.py loaddata annotTutorial/experiment_json/track_memaassay_20180331_oo.json

Assay and superset tracking must be tailored to the particular experimental protocol. Alternatively, if you don’t want to track superset and assay generation at all, you can easily erase these tables from your installation. Have a look at About date tracking! and HowTo disable the date tracking app? in the HowTo section of this manual.

Backup your Work


To back up your work done so far follow the instruction at HowTo backup annot content? in the HowTo section of this manual.

And with this we come to the end of this tutorial.