Skip to content
Snippets Groups Projects

Release 2.0 (merge dev to master)

Merged Loraine Gueguen requested to merge dev into master
1 file
+ 2
2
Compare changes
  • Side-by-side
  • Inline
+ 22
22
# gga_load_data tools
The gga_load_data tools allow automated deployment of GMOD visualisation tools (Chado, Tripal, JBrowse, Galaxy) for a bunch of genomes and datasets.
They are based on the Galaxy Genome Annotation (GGA) project (https://galaxy-genome-annotation.github.io).
They are based on the [Galaxy Genome Annotation (GGA) project](https://galaxy-genome-annotation.github.io).
A stack of Docker services is deployed for each organism, from an input yaml file describing the data.
A stack of Docker services is deployed for each species, from an input yaml file describing the data.
See `examples/example.yml` for an example of what information can be described and the correct formatting of this input file.
Each GGA environment is deployed at [https://hostname/sp/genus_species/](https://hostname/sp/genus_species/).
A GGA environment is deployed for each different species at [https://hostname/sp/genus_species/](https://hostname/sp/genus_species/).
Multiple strains can have the same species and are deployed in the same GGA environment.
## Requirements
@@ -22,7 +23,7 @@ Traefik is a reverse proxy which allows to direct HTTP traffic to various Docker
The Traefik dashboard is deployed at [https://hostname/traefik/](https://hostname/traefik/)
Authelia is an authentication agent, which can be plugged to an LDAP server, and that Traefik can you to check permissions to access services.
The authentication layer is optional. If used, the config file needs the variables `https_port`, `auth_hostname`, `authelia_config_path`.
The authentication layer is optional. If used, the config file needs the variables `https_port`, `authentication_domain_name`, `authelia_config_path`.
Authelia is accessed automatically by Traefik to check permissions everytime someones wants to access a page.
If the user is not logged in, he is redirected to the authelia portal.
@@ -33,18 +34,17 @@ Note that Authelia needs a secured connexion (no self-signed certificate) betwee
The "gga_load_data" tools are composed of 4 scripts:
- gga_init: Create directory tree for organisms and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks
- gga_get_data: Create `src_data` directory tree for organisms and copy datasets for the input organisms into the organisms directory tree
- gga_get_data: Create `src_data` directory tree for organisms and copy datasets for the input organisms into `src_data`
- gga_load_data: Load the datasets of the input organisms into their Galaxy library
- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to Phaeoexplorer data
- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to the [Phaeoexplorer](https://phaeoexplorer.sb-roscoff.fr) data
## Usage:
For all scripts one input file is required, that describes the species and their associated data.
(see `examples/example.yml`). Every dataset path in this file must be an absolute path.
(see `examples/citrus_sinensis.yml`). Every dataset path in this file must be an absolute path.
Another yaml file is required, the config file, with configuration variables (Galaxy and Tripal passwords, etc..) that
the scripts need to create the different services and to access the Galaxy container. By default, the config file
inside the repository root will be used if none is precised in the command line. An example of this config file is available
the scripts need to create the different services and to access the Galaxy container. An example of this config file is available
in the `examples` folder.
**The input file and config file have to be the same for all scripts!**
@@ -52,7 +52,7 @@ in the `examples` folder.
- Deploy stacks part:
```bash
$ python3 /path/to/repo/gga_init.py input_file.yml -c/--config config_file [-v/--verbose] [OPTIONS]
$ python3 /path/to/repo/gga_init.py input_file.yml -c/--config config_file.yml [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to create/update stacks; default=current directory)
--force-traefik (If specified, will overwrite traefik and authelia files; default=False)
```
@@ -67,28 +67,27 @@ $ python3 /path/to/repo/gga_get_data.py input_file.yml [-v/--verbose] [OPTIONS]
- Load data in Galaxy library and prepare Galaxy instance:
```bash
$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file [-v/--verbose]
$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file.yml [-v/--verbose]
--main-directory $PATH (Path where to access stacks; default=current directory)
```
- Run a workflow in galaxy:
```bash
$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS]
--workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository)
$ python3 /path/to/repo/run_workflow_phaeoexplorer.py input_file.yml -c/--config config_file --workflow workflow_type [-v/--verbose] [OPTIONS]
--workflow (Valid options: "chado_load_fasta_gff_jbrowse", "blast", "interpro", preset workflows are available in the "workflows_phaeoexplorer" directory)
--main-directory $PATH (Path where to access stacks; default=current directory)
```
## Limitations
The stacks deployment and the data loading into Galaxy should be run separately and only once the Galaxy service is ready.
The `gga_load_data.py` script check that the Galaxy service is ready before loading the data and exit with a notification if it is not.
The data loading into Galaxy with `gga_load_data.py` should be run only once the Galaxy service deployed with `gga_init.py` is ready.
The `gga_load_data.py` script checks that the Galaxy service is ready before loading the data and exit with a notification if it is not.
The status of the Galaxy service can be checked manually with `$ docker service logs -f genus_species_galaxy` or
`./serexec genus_species_galaxy supervisorctl status`.
**Note**:
When deploying the stack of services, the Galaxy service can take a long time to be ready, because of the data persistence.
In development mode only, this can be disabled by setting the variable `persist_galaxy_data` to `False` in the config file.
In development mode only, this can be disabled by setting the variable `galaxy_persist_data` to `False` in the config file.
## Directory tree:
@@ -149,8 +148,9 @@ Directory tree structure:
[BSD 3-Clause](./LICENSE)
## Acknowledgments
[Anthony Bretaudeau](https://github.com/abretaud)
## Contributors
[Matéo Boudet](https://github.com/mboudet)
\ No newline at end of file
- [Matéo Boudet](https://github.com/mboudet)
- [Anthony Bretaudeau](https://github.com/abretaud)
- [Loraine Brillet-Guéguen](https://github.com/loraine-gueguen)
- [Arthur Le Bars](https://gitlab.com/Troubardours)
\ No newline at end of file
Loading