Skip to content
Snippets Groups Projects

Release v2.1.0

Merged Loraine Gueguen requested to merge dev into master
1 file
+ 4
4
Compare changes
  • Side-by-side
  • Inline
+ 43
20
@@ -6,9 +6,25 @@ They are based on the [Galaxy Genome Annotation (GGA) project](https://galaxy-ge
A stack of Docker services is deployed for each species, from an input yaml file describing the data.
See `examples/example.yml` for an example of what information can be described and the correct formatting of this input file.
The services currently deployed are:
- Chado database
- Tripal: database interface and hub to all applications
- Elasticsearch: searching service used in Tripal
- JBrowse: genome browser
- Nginx proxy: page to download the data files
- Blast (optional): BLAST interface to query the data
- Galaxy: data loading orchestrator for administrators
A GGA environment is deployed for each different species at [https://hostname/sp/genus_species/](https://hostname/sp/genus_species/).
Multiple strains can have the same species and are deployed in the same GGA environment.
![gga_schema](images/gga_schema.png)
_**Figure** : Schematic representation of a set of Docker containers deployed with
GGA for typical genomes. Cuboids represent Docker containers. Hexagons represent different
sets of Docker containers. Blue arrows represent HTTP traffic. Gray arrows represent
data exchange performed using Galaxy tools. Black arrows represent data exchange
inherent in applications._
## Requirements
To run the gga_load_data tools, Python 3.6 and the packages listed in [requirements.txt](./requirements.txt) are required.
@@ -19,11 +35,11 @@ and a [swarm](https://docs.docker.com/engine/swarm/swarm-tutorial) (for cluster
## Reverse proxy and authentication
Traefik is a reverse proxy which allows to direct HTTP traffic to various Docker Swarm services.
[Traefik](https://doc.traefik.io/traefik/) is a reverse proxy which allows to direct HTTP traffic to various Docker Swarm services.
The Traefik dashboard is deployed at [https://hostname/traefik/](https://hostname/traefik/)
Authelia is an authentication agent, which can be plugged to an LDAP server, and that Traefik can you to check permissions to access services.
The authentication layer is optional. If used, the config file needs the variables `https_port`, `authentication_domain_name`, `authelia_config_path`.
[Authelia](https://www.authelia.com/docs/) is an authentication agent, which can be plugged to an LDAP server, and that Traefik can used to check permissions to access services.
The authentication layer is optional. If used, the config file needs the variables `https_port`, `authentication_domain_name`, `authelia_config_path`, `authelia_secrets_env_path`, `authelia_db_postgres_password`.
Authelia is accessed automatically by Traefik to check permissions everytime someones wants to access a page.
If the user is not logged in, he is redirected to the authelia portal.
@@ -33,10 +49,11 @@ Note that Authelia needs a secured connexion (no self-signed certificate) betwee
The "gga_load_data" tools are composed of 4 scripts:
- gga_init: Create directory tree for organisms and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks
- gga_get_data: Create `src_data` directory tree for organisms and copy datasets for the input organisms into `src_data`
- gga_load_data: Load the datasets of the input organisms into their Galaxy library
- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to the [Phaeoexplorer](https://phaeoexplorer.sb-roscoff.fr) data
- `gga_init.py`: Create directory tree for organisms and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks
- `gga_get_data.py`: Create `src_data` directory tree for organisms and copy datasets for the input organisms into `src_data`
- `gga_load_data.py`: Load the datasets of the input organisms into their Galaxy library
- `gga_run_workflow_phaeo_*.py`: Multiple scripts to run custom workflows in Galaxy, proposed as "example scripts" to take inspiration from
as workflow parameters are specific to the [Phaeoexplorer](https://phaeoexplorer.sb-roscoff.fr) data
## Usage:
@@ -49,34 +66,40 @@ in the `examples` folder.
**The input file and config file have to be the same for all scripts!**
- Deploy stacks part:
- Deploy Docker stacks:
```bash
$ python3 /path/to/repo/gga_init.py input_file.yml -c/--config config_file.yml [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to create/update stacks; default=current directory)
--force-traefik (If specified, will overwrite traefik and authelia files; default=False)
$ python3 /path/to/repo/gga_init.py input_file.yml \
-c/--config config_file.yml \
--main-directory $PATH (Path where to create/update stacks; default=current directory) \
--force-traefik (If specified, will overwrite traefik and authelia files; default=False) \
[-v/--verbose]
```
- Copy source data file:
- Copy source data files:
```bash
$ python3 /path/to/repo/gga_get_data.py input_file.yml [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to access stacks; default=current directory)
$ python3 /path/to/repo/gga_get_data.py input_file.yml \
--main-directory $PATH (Path where to access stacks; default=current directory) \
[-v/--verbose]
```
- Load data in Galaxy library and prepare Galaxy instance:
```bash
$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file.yml [-v/--verbose]
--main-directory $PATH (Path where to access stacks; default=current directory)
$ python3 /path/to/repo/gga_load_data.py input_file.yml \
-c/--config config_file.yml \
--main-directory $PATH (Path where to access stacks; default=current directory)\
[-v/--verbose]
```
- Run a workflow in galaxy:
- Run a workflow in Galaxy (example script, specific to Phaeoexplorer data):
```bash
$ python3 /path/to/repo/run_workflow_phaeoexplorer.py input_file.yml -c/--config config_file --workflow workflow_type [-v/--verbose] [OPTIONS]
--workflow (Valid options: "chado_load_fasta_gff_jbrowse", "blast", "interpro", preset workflows are available in the "workflows_phaeoexplorer" directory)
--main-directory $PATH (Path where to access stacks; default=current directory)
$ python3 /path/to/repo/gga_run_workflow_phaeo_jbrowse.py input_file.yml \
-c/--config config_file \
--main-directory $PATH (Path where to access stacks; default=current directory) \
[-v/--verbose]
```
The data loading into Galaxy with `gga_load_data.py` should be run only once the Galaxy service deployed with `gga_init.py` is ready.
Loading