Skip to content
Snippets Groups Projects
README.md 5.44 KiB

gga_load_data (WIP)

Automated integration of new organisms into GGA environments as a form of a docker stack of services.

Description:

Automatically generate functional GGA environments from a descriptive input file. See example datasets (example.json, example.yml or example.xlsx) for an example of what information can be described and the correct formatting of these input files.

"gga_load_data" in its current version is divided in three (automated) parts:

  • Create the stacks of services for the input organisms (orchestrated using docker swarm, with traefik used as a networking interface between the different stacks)
  • Load the organisms datasets into the galaxy instance
  • Remotely run a custom workflow in galaxy

Metadata files (WIP):

A metadata file will be generated to summarize what actions have previously been taken inside a stack.

Directory tree:

For every input organism, a dedicated directory is created. The script will create this directory and all subdirectories required.

If the user is adding new data to a species (for example adding another strain/sex's datasets to the same species), the directory tree will be updated

Directory tree structure:

/main_directory
|
|---/genus1_species1
|   |
|   |---/blast
|   |   |---/links.yml
|   |   |---/banks.yml
|   |
|   |---/nginx
|   |   |---/conf
|   |       |---/default.conf
|   |
|   |---/blast
|   |   |---/banks.yml
|   |   |---/links.yml
|   |
|   |---/docker_data  # Data used internally by docker (do not delete!)
|   |  
|   |---/src_data
|   |	|---/genome
|   | 	|	|---/genus1_species1_strain_sex                       
|   |   |       |---/vX.X
|   |   |        	|---/genus_species_vX.X.fasta
|   |   |
|   |   |---/annotation
|   |	|   |---/genus1_species1_strain_sex
|   |   |       |---/OGSX.X
|   |   |           |---/OGSX.X.gff
|   |   |           |---/OGSX.X_pep.fasta
|   |   |           |---/OGSX.X_transcripts.fasta
|   |   |
|   |   |---/tracks
|   |    	|---/genus1_species1_strain_sex
|   |                    
|   |---/apollo	
|   |   |---/annotation_groups.tsv
|   |
|   |---/docker-compose.yml
|   |
|   |---/metada_genus1_species1.yml (WIP)
|
|---/metadata.yml
|
|---/traefik
    |---/docker-compose.yml
    |---/authelia
	    |---/users.yml
	    |---/configuration.yml

Steps:

For each input organism, the tool works in three parts (1 part = 1 separate script).

The first two parts are required to set up a functional GGA stack

Part 1)

  1. Create the directory tree structure (if it already exists, only create the required subdirectories)

  2. Create the docker-compose file for the organism and deploy the stack of services.

Warning: the Galaxy service takes up to 2 hours to be set up. During these 2 hours it can't be interacted with, wait at least 2 hours before calling the other scripts

Part 2)