-
Arthur Le Bars authoreddeea9f69
gga_load_data (WIP)
Automated integration of new organisms into GGA environments as a form of a docker stack of services.
Description:
Automatically generate functional GGA environments from a descriptive input file. See example datasets (example.json, example.yml or example.xlsx) for an example of what information can be described and the correct formatting of these input files.
"gga_load_data" in its current version is divided in three (automated) parts:
- Create the stacks of services for the input organisms (orchestrated using docker swarm, with traefik used as a networking interface between the different stacks)
- Load the organisms datasets into the galaxy instance
- Remotely run a custom workflow in galaxy
Metadata files (WIP):
A metadata file will be generated to summarize what actions have previously been taken inside a stack.
Directory tree:
For every input organism, a dedicated directory is created. The script will create this directory and all subdirectories required.
If the user is adding new data to a species (for example adding another strain/sex's datasets to the same species), the directory tree will be updated
Directory tree structure:
/main_directory
|
|---/genus1_species1
| |
| |---/blast
| | |---/links.yml
| | |---/banks.yml
| |
| |---/nginx
| | |---/conf
| | |---/default.conf
| |
| |---/blast
| | |---/banks.yml
| | |---/links.yml
| |
| |---/docker_data # Data used internally by docker (do not delete!)
| |
| |---/src_data
| | |---/genome
| | | |---/genus1_species1_strain_sex
| | | |---/vX.X
| | | |---/genus_species_vX.X.fasta
| | |
| | |---/annotation
| | | |---/genus1_species1_strain_sex
| | | |---/OGSX.X
| | | |---/OGSX.X.gff
| | | |---/OGSX.X_pep.fasta
| | | |---/OGSX.X_transcripts.fasta
| | |
| | |---/tracks
| | |---/genus1_species1_strain_sex
| |
| |---/apollo
| | |---/annotation_groups.tsv
| |
| |---/docker-compose.yml
| |
| |---/metada_genus1_species1.yml (WIP)
|
|---/metadata.yml
|
|---/traefik
|---/docker-compose.yml
|---/authelia
|---/users.yml
|---/configuration.yml
Steps:
For each input organism, the tool works in three parts (1 part = 1 separate script).
The first two parts are required to set up a functional GGA stack
Part 1)
-
Create the directory tree structure (if it already exists, only create the required subdirectories)
-
Create the docker-compose file for the organism and deploy the stack of services.
Warning: the Galaxy service takes up to 2 hours to be set up. During these 2 hours it can't be interacted with, wait at least 2 hours before calling the other scripts
Part 2)