Skip to content
Snippets Groups Projects

Release 1.0

Merged Loraine Gueguen requested to merge dev into master
+ 47
45
@@ -17,9 +17,51 @@ The "gga_load_data" tool is divided in 4 separate scripts:
- gga_load_data: Load the datasets of the input organisms into their Galaxy library
- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to Phaeoexplorer data
## Usage:
The scripts all take one mandatory input file that describes the species and their associated data
(see `examples/example.yml`). Every dataset path in this file must be an absolute path.
You must also fill in a config file containing sensitive variables (Galaxy and Tripal passwords, etc..) that
the script will read to create the different services and to access the Galaxy container. By default, the config file
inside the repository root will be used if none is precised in the command line. An example of this config file is available
in the `examples` folder.
**The input file and config file have to be the same for all scripts!**
- Deploy stacks part:
```bash
$ python3 /path/to/repo/gga_init.py your_input_file.yml -c/--config your_config_file [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to create/update stacks; default=current directory)
--force-traefik (If specified, will overwrite traefik and authelia files; default=False)
```
- Copy source data file:
```bash
$ python3 /path/to/repo/gga_get_data.py your_input_file.yml [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to access stacks; default=current directory)
```
- Load data in Galaxy library and prepare Galaxy instance:
```bash
$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file [-v/--verbose]
--main-directory $PATH (Path where to access stacks; default=current directory)
```
- Run a workflow in galaxy:
```bash
$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS]
--workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository)
--main-directory $PATH (Path where to access stacks; default=current directory)
```
## Directory tree:
For every input organism, a dedicated directory is created. The script will create this directory and all subdirectories required.
For every input organism, a dedicated directory is created with `gga_get_data.py`. The script will create this directory and all subdirectories required.
If the user is adding new data to a species (for example adding another strain dataset to the same species), the directory tree will be updated
@@ -72,58 +114,18 @@ Directory tree structure:
```
## Usage:
The scripts all take one mandatory input file that describes the species and their associated data
(see `examples/example.yml`). Every dataset path in this file must be an absolute path.
You must also fill in a config file containing sensitive variables (Galaxy and Tripal passwords, etc..) that
the script will read to create the different services and to access the Galaxy container. By default, the config file
inside the repository root will be used if none is precised in the command line. An example of this config file is available
in the `examples` folder.
- Deploy stacks part:
```bash
$ python3 /path/to/repo/gga_init.py your_input_file.yml -c/--config your_config_file [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to create/update stacks; default=current directory)
--force-traefik (If specified, will overwrite traefik and authelia files; default=False)
```
- Copy source data file:
```bash
$ python3 /path/to/repo/gga_get_data.py your_input_file.yml [-v/--verbose] [OPTIONS]
--main-directory $PATH (Path where to access stacks; default=current directory)
```
- Load data in Galaxy library and prepare Galaxy instance:
```bash
$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file [-v/--verbose]
--main-directory $PATH (Path where to access stacks; default=current directory)
```
- Run a workflow in galaxy:
```bash
$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS]
--workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository)
--main-directory $PATH (Path where to access stacks; default=current directory)
```
**The input file and config file have to be the same for all scripts!**
## Current limitations
When deploying the stack of services, the Galaxy service can take a long time to be ready. This is due to the Galaxy container preparing a persistent location for the container data. This can be bypassed by setting the variable "persist_galaxy_data" to "False" in the config file.
The stacks deployment and the data loading into Galaxy should hence be run separately and only once the Galaxy service is ready.
The `gga_load_data.py` script will check that the Galaxy service is ready before loading the data and will exit with a notification if it is not.
You can check the status of the Galaxy service with `$ docker service logs -f genus_species_galaxy` or
`./serexec genus_species_galaxy supervisorctl status`.
When deploying the stack of services, the Galaxy service can take a long time to be ready. This is due to the Galaxy container preparing a persistent location for the container data.
In development mode only, this can be bypassed by setting the variable "persist_galaxy_data" to "False" in the config file.
## Requirements
Requires Python 3.6
Loading