# Tutorial

This tutorial will step-by-step guide you through the process of
1. populating annot with backed up controlled vocabulary and update
  it with the latest ontology version available online.
1. populating annot with backed up sample and reagent bricks.
1. populating annot with backed up study and investigation information.
1. populating annot with backed up experiment layouts
  and layout one acjson file yourself.
1. populating annot with backed up tracking information.
1. backup the work done.


## Preparation
---

1. Before you follow this tutorial you have to install the development version
  of annot as described in *HowTo install annot*.
1. run `git clone https://gitlab.com/biotransistor/annotTutorial.git` to clone
  the tutorial material to you machine.
1. run `cp -r annotTutorial annot/web/` to copy  the cloned annotTutorial folder
  into the annot/web/ folder from your annot installation.
1. run `rm -fr annot/web/annotTutorial/.git` to remove the .git folder in the
  copied annotTutorial folder.


## Controlled Vocabulary
---

1. Enter annot

    1. `docker-machine ls` lists all the installed docker machines.
    1. `docker-machine start an0` starts the an0 docker machine, if needed.
    1. `eval $(docker-machine env an0)` loads the an0 environment variables.
1. run `docker exec -ti annot_webdev_1 /bin/bash` enter the annot_webdev_1 docker container
    1. `ls` should list among others the annotTutorial folder.

1. Load the backed up vocabulary.

    1. `cp annotTutorial/vocabulary_json/* ../media/vocabulary/backup/` copies the
      backed up vocabularies to the right place inside annot.
    1. `python manage.py vocabulary_loadbackup` will populate each vocabulary app
      first with the latest backup found at /usr/src/media/vocabulary/backup/`,
      then, it will download the latest ontology version, if the online version
      is newer than the version already in the database, and update the database
      content with it.

If you get a `urllib.error.HTTPError: HTTP Error 401: Unauthorized` error, then
your `APIKEY_BIOONTOLOGY` credential inside `annot/web/prjannot/crowbar.py` will
most probably be wrong.

Now, let’s find out with which ontologies and version annot's vocabularies were populated.

1. point your browser to `http://192.168.99.100/admin/` and login with your credentials.

1. click the red colored `Sys admin ctrl vocabularies` link. A table should pop
  up which lists all vocabularies and the information we were interested in.


## Bricks
---

1. `python manage.py loaddata annotTutorial/brick_tsv/person_brick_20180331_235444_oo.json`
  loads the person brick into the database. The person brick is special kind of brick
  that doesn't annotate data. The person brick is used to annotate the responsible
  person for each sample and reagent bricks. This is the reason that it is re-loaded
  a bit differently than the other bricks.

1. let's have a look at the upload person bricks.

    1. point your browser to `http://192.168.99.100/admin/` and login with your
      credentials.
    1. click the yellow colored `Staff` link. A table should pop up,
      displaying the uploaded bricks.

1. `python manage.py antibody1_tsv2db annotTutorial/brick_tsv/antibody1_brick_20181024_003732_human.txt`
  load the primary antibody bricks.

    1. let's have a look at the upload primary antibody bricks. Click the orange
     colored `Endpoint Primary Antibody` link to retrieve the database table.

1. `python manage.py antibody2_tsv2db annotTutorial/brick_tsv/antibody2_brick_20180510_020008_human.txt`
  loads the secondary antibody bricks.
1. `python manage.py cstain_tsv2db annotTutorial/brick_tsv/cstain_brick_20180503_020012_human.txt`
  loads the compound stain bricks.
1. `python manage.py compound_tsv2db annotTutorial/brick_tsv/compound_brick_20180511_020009_human.txt`
  loads the compound bricks.
1. `python manage.py proteinset_tsv2db annotTutorial/brick_tsv/proteinset_brick_20180502_020026_human.txt`
  loads the protein complex bricks.
1. `python manage.py protein_tsv2db annotTutorial/brick_tsv/protein_brick_20180502_020024_human.txt`
  loads the protein bricks.

1. The sample bricks are a bit special because of the `sample_parent` field.
  This means sample bricks might relate to other sample bricks. Because of that
  we first have to manually generate the `not_available-notavailable_notavailable_notavailable`
  and `not_yet_specified-notyetspecified_notyetspecified_notyetspecified` sample 
  before we can upload the sample bricks with the usual command. For the same
  reason, we might have to call the human\_tsv2db command several times till all
  sample bricks are loaded into the database.

    1. click the orange colored `Sample Homo Sapiens` `Add Human Brick` link.
    1. click `Save`. An error message: “`Please correct the errors below.`” will
      appear.
    1. Scroll down. For every “`This field is required.`” error choose
      `not_yet_specified`.
    1. when you reached the bottom click `Save`. This should bring you back to
      the database table. sample `not_yet_specified-notyetspecified_notyetspecified_notyetspecified`
      should now exist.
    1. click `not_yet_specified-notyetspecified_notyetspecified_notyetspecified`
    1. change `Sample`, `Provider`, `Provider_catalog_id`, `Provider_batch_id`
      to `not_available`.
    1. click `Save`. Sample `not_available-notavailable_notavailable_notavailable`
      should now exist too.
    1. now run command `python manage.py human_tsv2db annotTutorial/brick_tsv/human_brick_20180615_020026_human.txt`,
      if needed several time, until all sample bricks are loaded.

    Now, all backed up bricks should be loaded.

1. Let's have a look how this brick information can be downloaded.
  All brick information can primary be downloaded in
  [json](https://en.wikipedia.org/wiki/JSON) and
  [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format.

    1. click the orange colored `Perturbation_Protein` link.
    1. In the `Action` drop down list choose `Download protein page as as json file`
      and click `Go`.
    1. In the `Action` drop down list choose `Download protein page as as tsv file`
      and click `Go`.

  This will download all protein bricks stored in the database once in json
  and once in tsv format. The download procedure is the same for any brick type
  (primary antibody, secondary antibody, compound stain, compound, protein, proteinset and homo sapiens sample).

  Additionally, you can download related bricks form the investigation, study,
  assay run and acaxis layer. From there the bricks are downloadable in
  annot's json and tsv standards and additionally in the
  [lincs data standard](http://www.lincsproject.org/LINCS/data/standards).
  Please note: The lincs data standard contains not all stored brick information.
  Only the fields compatible with the lincs standard. Further, are the bricks
  re-grouped into an antibody, a small molecules, a protein and a cell line file, 
  this as well because of the lincs standard.


### Upload the Bricks to make them accessible in the Experiment Layout Layer!

Before any brick is accessible for experiment layout, it must be uploaded
into the corresponding `Uploaded enpoint reagent bricks`,
`Uploaded perturbation reagent bricks` or `Uploaded sample bricks` table.
The first time after you install annot you have to do this via command line,
because the database tables which the GUI relies on have to be initialized.
After that you can populate the brick tables via command line or GUI.

On the command line:

1. `python manage.py brick_load` will upload all bricks.


On the GUI:

1. scroll down to the red colored `Appsabrick` block.
1. click the `Sys admin bricks` link. You will find a table with all the
  brick types and some information about their bricking. This is the place
  were you from now on can brick bricks by GUI.

    1. choose the brick types you like to brick by clicking the box in front of it.
    1. in the `Action` drop down list choose `Upload brick` and click `Go`.

1. go back to `Home › Appsabrick`
1. click the `Uploaded endpoint reagent bricks` or `Uploaded perturbation reagent bricks` 
  or `Uploaded sample bricks`. This are the tables containing the uploaded bricks.
  Those are the bricks accessible for layout.


## Investigations and Studies
---
Let's load from the backup what was stored in the Investigation and Study table
  under the black colored links.
1. run `docker exec -ti annot_webdev_1 /bin/bash` to enter annot by command line.
1. run `python manage.py loaddata annotTutorial/experiment_json/investigation_investigation_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/study_study_20181024_oo.json`
1. click on the black colored `Investigation` link to see the reloaded content.
1. click on the black colored `Study` link to see the reloaded content.


## Experiment Layouts
---

Annot provides you with a super flexible way to layout any biological experiment.
In a first step the three major axis from each biological experiment - sample,
perturbation, endpoint - are laid out on the `Acaxis` layer.
Then, these axes are pulled together on the `Assay run` layer.

For example, let's layout a lapatinib perturbation:

1. click the cyan `Set_of_Perturbation` `Add_perturbation_set` link.
1. enter the Setname: `ds-lapatinib_750nM_v1`
1. click inside the button next to `Brick:` this is a searchable drop down list.
  type `lapatinib` into the button. the list will immediately filter as you type.
  click on `compound-lapatinib_chebi49603-SelleckOwn_S1028_notavailable`.
  If you can not find lapatinib in the list because the list is still empty,
  remember that you first have to upload the bricks so that they become accessible. 
  Do so. Then the brick should appear in the drop down list.
  In this example we will only layout a lapatinib perturbation. If you would
  have to layout a 384 well plate with dozens of different perturbations keep on
  selecting all the reagent bricks you need.
1. click `Save`

Now let's layout the plate design. For this we will generate and edit the
acjson template script code.

1. click the cyan `Set_of_Perturbation` `Add_perturbation_set` link.
1. click in the box in front of the `ds-lapatinib750nmv1` row. The box will turn
  blue and get a tick.
1. in the `Action` drop down list choose `Download selected set's as python3 acpipe template script`
  and click `Go`.
1. open the downloaded `acpipeTemplateCode_ds-lapatinib_750nM_v1.py` file in a
  [plain text editor](https://en.wikipedia.org/wiki/Text_editor)

Let's have a look a the generated template file in detail.

The *first part* is the so called header.

  ```
  ###
  # title:  acpipeTemplateCode_ds-lapatinib_750nM_v1.py
  #
  # date: 2018-04-02
  # license: GPL>=3
  # author: bue
  #
  # description:
  #   acpipe script to generate a perturbationset related acjson file.
  #   template automatically generated by annot softawre.
  #   check out: https://gitlab.com/biotransistor/annot
  ###
  ```
  The header gives some basic information about the code in a comment section defined by hashtags (`#`). 

The *second part* loads the libraries needed to interpret the program beside
the python3 standard library. Those libraries are copy, json and acpipe\_acjson.
  ```
  # python library
  import copy
  import json

  # acpipe library
  # check out: https://gitlab.com/biotransistor/acpipe_acjson
  import acpipe_acjson.acjson as ac
  ```

The *third part* builds an acjson object. Acjson stands for *assay coordinate java script object notation*.
  Acjson is the format we will use to layout the entire experiment.
  The acjson format is based on and totally compatible with the [json](http://json.org/) syntax.
  ```
  # build acjson
  d_acjson = ac.acbuild(
      s_runid='ds-lapatinib_750nM_v1',
      s_runtype='annot_acaxcis',
      s_welllayout='?x?',
      ti_spotlayout=(1,),
      s_log='from the bricks'
  )
  ```

  Notably, in the template code, annot was able to specify `s_runid`, `s_runtype` and `s_log`.
  The question marks (`?`) in the code reflect that the wellplate layout has not yet been specified.
  You could for example specify a 384 wellplate (`16x24`) or a 96 wellplate (`8x12`) or petri dish (`1x1`) or a tube (`1x1`).
  We set `s_welllayout` to `1x1` because we will treat all our wells with the same lapatinib concentration.
  Annot can handle arrays with multiple spots per well.
  However, we will leave `ti_spotlayout` by the default which is `(1,)`,
  since lapatinib not is spotted to the array.

The *fourth part* describes the experimental layout.
  ```
  # reagent: compound-lapatinib_chebi49603-SelleckOwn_S1028_notavailable
  s_gent = 'lapatinib_chebi49603'
  d_record = {s_gent: copy.deepcopy(ac.d_RECORDLONG)}
  d_record[s_gent].update(copy.deepcopy(ac.d_SET))
  d_record[s_gent].update(copy.deepcopy(ac.d_EXTERNALID))
  d_record[s_gent].update({'manufacture': 'Selleck_Own'})
  d_record[s_gent].update({'catalogNu': 'S1028'})
  d_record[s_gent].update({'batch': 'not_available'})
  d_record[s_gent].update({'conc': ?})
  d_record[s_gent].update({'concUnit': 'nmolar_uo0000065'})
  d_record[s_gent].update({'cultType': '?'})
  d_record[s_gent].update({'timeBegin': ?})
  d_record[s_gent].update({'timeEnd': ?})
  d_record[s_gent].update({'timeUnit': 'hour_uo0000032'})
  d_record[s_gent].update({'recordSet': ['ds-lapatinib_750nM_v1']})
  d_record[s_gent].update({'externalId': 'LSM-1051'})
  d_acjson = ac.acfuseaxisrecord(
      d_acjson,
      s_coor='?',
      s_axis='perturbation',
      d_record=d_record
  )
  ```
  As you con see, annot pulled as much information it could from the setname
  (recordSet) and the bricks (manufacture, catalogNu, batch, concUnit, timeUnit, externalId).
  However, we have not yet specified the actual concentration and time and how
  lapatinib will be provided to the cells
  (e.g. [batch](https://en.wikipedia.org/wiki/Bacterial_growth)
  or [fed](https://en.wikipedia.org/wiki/Fed-batch_culture)
  or [contiuose](https://en.wikipedia.org/wiki/Chemostat)).
  Please set `conc` to `750` nmol, `timeBegin` to `0` hours, `timeEnd` to
  `72` hours and `cultType` to `batch`. Now, as we said this is a petri dish.
  Natural the `coor` coordinate will be `'1'`. Notice that the coordinate,
  even it is an integer, is given as string. This is because coor is
  a json dictionary key, and json dictionary keys, have to be strings to be
  compatible with the [json](http://json.org/) syntax.

  In the acjson format the coordinate numbering have to starting always by '1'
  and increases by 1, starting at the upper left corner, track every spot
  inside a well, well by well, from left to right, from top to bottom.

In the *last part* the acjson object is written into a json file.
  ```
  # write to json file
  print('write file: {}'.format(d_acjson['acid']))
  with open(d_acjson['acid'], 'w') as f_acjson:
      json.dump(d_acjson, f_acjson, sort_keys=True)
  ```

Now lets generate and upload the acjson file.

This is [python3](https://www.python.org/) code. You will have to install
python3 on your computer and an additional python3 library called
[acpipe_acjson](https://pypi.org/project/acpipe-acjson/) to run this code.

1. How to install `python3` depends very much on the operating system you are
  running. Install per your system.

1. After you have installed python3, you can install acpipe\_acjoson by running
  `pip3 install acpipe_acjson` from the command line.

1. run `python3 acpipeTemplateCode_ds-lapatinib_750nM_v1.py` to run the modified
  template code from the command line.
  The resulting file's name `annot_acaxis-ds-lapatinib_750nM_v1_ac.json` is
  constructed out of runtype `annot_acaxis` and runid `ds-lapatinib_750nM_v1`.
  To study the json file open it in a webbrowser or text editor. You can
  change the last line of the code from `json.dump(d_acjson, f_acjson, sort_keys=True)` 
  to `json.dump(d_acjson, f_acjson, indent=4, sort_keys=True)` to get better
  human readable output. However, the resulting file will as well take
  more disk space then a file without indent.
  The resulting acjson file will look like this:
    ```
    {
        "1": {
            "endpoint": null,
            "iSpot": 1,
            "iWell": 1,
            "iixii": "1_1x1_1",
            "ixi": "1x1",
            "perturbation": {
                "lapatinib_chebi49603": {
                    "batch": "not_available",
                    "catalogNu": "S1028",
                    "conc": 750,
                    "concUnit": "nmolar_uo0000065",
                    "cultType": "batch",
                    "externalId": "LSM-1051",
                    "manufacture": "Selleck_Own",
                    "recordSet": ["ds-lapatinib_750nM_v1"],
                    "timeBegin": 18,
                    "timeEnd": 90,
                    "timeUnit": "hour_uo0000032"
                }
            },
            "sample": null,
            "sxi": "Ax1"
        },
        "acid": "annot_acaxis-ds-lapatinib_750nM_v1_ac.json",
        "log": "from the bricks",
        "runid": "ds-lapatinib_750nM_v1",
        "runtype": "annot_acaxis",
        "spotlayout": [
            1
        ],
        "welllayout": "1x1"
    }
    ```

1. to upload the generated acjson file

    1. click the cyan `Set_of_Perturbation` link.
    1. then click the `ds-lapatinib750nmv1` link.
    1. then click the `Browse…` button.
    1. search and choose the `annot_acaxis-ds-lapatinib_750nM_v1_ac.json` file and
      click `Open`.
    1. then click `Save`.
    1. in the `Acjson file` column should now appear a link
      `upload/acaxis/annot_acaxis-ds-lapatinib_750nM_v1_ac.json`. click this link.
    1. the uploaded json file should open in the browser. Use your
      `browser’s back arrow` to go back to the perturbation set table.
    1. optional: install a json viewer addon in your browser, as described in
      *HowTo json files and youe web browser*. Then click again the
      `upload/acaxis/annot_acaxis-ds-lapatinib_750nM_v1_ac.json` link
      in the `Acjson file` column. The file should now appear nicely rendered.


1. now let's check the uploaded acjson against the brick content it was generated from.

    1. click in the box in front of the `ds-lapatinib750nmv1` row. The box will
      turn blue and get a tick.
    1. in the `Action` drop down list choose `Select selected set's acjson file against brick content`.
      You should receive a message "Ds-lapatinib750nmv1 # sucessfull checked.".
      Or a "Warning" or "Error" when something for annot not totally is as expected
      with the uploaded acjson file.

When everything worked as expected, then it is now time to store the modified
python3 template code in your own source code repository. Just in case you later
have to modify the layout and regenerate the acjson.
Follow the instruction described in *HowTo backup acpipeTemplateCode_*.py* .

As a reference, all modified template scripts to generate all acjson used
in this tutorial can be found inside the same folder as the acjson files.
Have for example a look at `annotTutorial/experiment_acjson/acaxis/acpipeTemplateCode_es-1layout2v3.py`. 
This script layouts a 384 wellplate.
Or have a look at `annotTutorial/experiment_acjson/runset/acpipeTemplateCode_mema-LI8C00201.py`. 
This is the script to generate the assay run acjson out of sample, perturbation,
endpoint and superset acjsons, for mema assay LI8C00201.

The rest of the acjson used in this tutorial we will now restore from the backup.

1. run `docker exec -ti annot_webdev_1 /bin/bash` to enter annot by command line.
1. run `cp -r annotTutorial/experiment_acjson/* ../media/upload/` to copy the
  acjson file at the expected place.
1. run `python manage.py loaddata annotTutorial/experiment_json/acaxis_sampleset_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/acaxis_perturbationset_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/acaxis_endpointset_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/superset_acpipe_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/superset_supersetfile_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/superset_superset_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/runset_runset_20181024_oo.json`
  to load back the table content.

Now let's have a look what we can do with this acjson files.

1. The `Acjson_to_tsv_setting`. The acjson file stores information about
  each sample and reagent that was used in the screen.
  Now, in practice, when we download layout files or dataframes -
  which are generated straight out of the acjson files - we might not always
  be interested in all the details stored in the acjson files. We can
  tweak such downloads in the Acjson to tsv setting.

    1. click the cyan `Acjson_to_tsv_setting` `Add_acjson_to_tsv` link.
    1. click `Save` it should now have generated an entry for the user you are
      now logged in.
    1. click on your `username`.
    1. You will see a list of all informative fields stored in an acjson. By
      clicking the boxes you can choose which one of those you like to write out 
      when you download a `tsv_long_layout` file,
      a `tsv_tidy_dataframes`, or a `tsv_unstacked_dataframe`.

1. Download `acjsons` and `layouts`.
  We described acjson files, how to generate and upload them, in the section above.
  Layout files are similar to Excel sheets that are commonly used for experimental layout description.

    1. click on the dark blue `Assay_Runs` link.
    1. click in the box in front of the `mema-LI8C00201` row.
    1. in the `Action` drop down list choose `Download select selected sets as acjson file`.
      This is the acjson file we before uploaded.
    1. in the `Action` drop down list choose `Download select selected sets as tsv_short layout file`.
      This will generate a bunch of layout files, one for each major setname,
      with no other content then the basic sample or reagent names.
    1. in the `Action` drop down list choose `Download select selected sets as tsv_long layout file`.
      This will generate a bunch of layout file with the reagent names and
      all content chosen in the `Acjson_to_tsv_setting` settings.

1. Downloading `dataframes`. Dataframes are tsv files in a format easy to upload
    into [pandas](https://pandas.pydata.org/) or [R](https://cran.r-project.org/)

    1. click on the cyan `Set_of_Perturbation` link
    1. click in the box in front of the `es-1layout2v3` row. This is a simple 384 well plate layout.
    1. in the `Action` drop down list choose `Download select selected sets as tidy dataframes` and click `Go`.
    1. in the `Action` drop down list choose `Download select selected sets as unstacked dataframe` and click `Go`.

    For a simple 384 well screen, using the browser GUI can generate a dataframe
    in a reasonable amount of time.
    For larger screens the fastest and easiest way to generate dataframes is
    to download the acjson file and generate the dataframe locally.

    1. click on the dark blue `Asssay_Runs` link
    1. click in the box in front of the `mema-LI8C00201` row.
    1. in the `Action` drop down list choose `Download selected sets as acjson file`.
      This will download an acjson file named `annot_runset-mema-LI8C00201_ac.json`
    1. move the `annot_runset-mema-LI8C00201_ac.json` into your woking directory.
    1. in your work directory run `python3` to start a python shell.

    In the python3 shell type:

    1. `import json` loads the json library.
    1. `import acpipe_acjson.acjson as ac`  loads the acjson library.
    1. `f_acjson = open("annot_runset-mema-LI8C00201_ac.json")` opens the
      file handle to the acjson file.
    1. `d_acjson = json.load(f_acjson)` loads the acjson file into a kind of
      a complicated python dictionary termed acjson object.
    1. `ac.acjson2dataframetsv(d_acjson, s_mode="tidy")` will generate three
       tidy stacked dataframe files named annot_runset-mema-LI8C00201dataframetsv_tidy_sample.tsv,
       annot_runset-mema-LI8C00201dataframetsv_tidy_perturbation.tsv, and annot_runset-mema-LI8C00201dataframetsv_tidy_endpoint.tsv
    1. `ac.acjson2dataframetsv(d_acjson, s_mode="unstacked")` will generate an
      unstacked dataframe file in the working directory named
      annot_runset-mema-LI8C00201dataframetsv_unstacked_spe.tsv


## Assay and Superset generation Tracking
---

Let's reload from the backup what was stored in the purple colored
`Mema assay tracking` and `Superset tracking` tables.

1. run `docker exec -ti annot_webdev_1 /bin/bash` to enter annot by command line.
1. run `python manage.py loaddata annotTutorial/experiment_json/track_memasuperset_20181024_oo.json`
1. run `python manage.py loaddata annotTutorial/experiment_json/track_memaassay_20180331_oo.json`

Assay and superset tracking must be tailored to the particular experimental protocol.
Alternatively, if you don’t want to track superset and assay generation at all,
you can easily erase these tables from your installation.
Have a look at *About date tracking!* and *HowTo disable the date tracking app?*
in the HowTo section of this manual.


## Backup your Work
---
To back up your work done so far follow the instruction at
*HowTo backup annot content?* in the HowTo section of this manual.


And with this we come to the end of this tutorial.