diff --git a/README.md b/README.md index f0d4e12ad2f9aa0febf4e3784fe41a1cc8d29804..1909756a58cb0831510a0b0646e74d48f347f0ef 100755 --- a/README.md +++ b/README.md @@ -17,9 +17,51 @@ The "gga_load_data" tool is divided in 4 separate scripts: - gga_load_data: Load the datasets of the input organisms into their Galaxy library - run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to Phaeoexplorer data +## Usage: + +The scripts all take one mandatory input file that describes the species and their associated data +(see `examples/example.yml`). Every dataset path in this file must be an absolute path. + +You must also fill in a config file containing sensitive variables (Galaxy and Tripal passwords, etc..) that +the script will read to create the different services and to access the Galaxy container. By default, the config file +inside the repository root will be used if none is precised in the command line. An example of this config file is available +in the `examples` folder. + +**The input file and config file have to be the same for all scripts!** + +- Deploy stacks part: + +```bash +$ python3 /path/to/repo/gga_init.py your_input_file.yml -c/--config your_config_file [-v/--verbose] [OPTIONS] + --main-directory $PATH (Path where to create/update stacks; default=current directory) + --force-traefik (If specified, will overwrite traefik and authelia files; default=False) +``` + +- Copy source data file: + +```bash +$ python3 /path/to/repo/gga_get_data.py your_input_file.yml [-v/--verbose] [OPTIONS] + --main-directory $PATH (Path where to access stacks; default=current directory) +``` + +- Load data in Galaxy library and prepare Galaxy instance: + +```bash +$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file [-v/--verbose] + --main-directory $PATH (Path where to access stacks; default=current directory) +``` + +- Run a workflow in galaxy: + +```bash +$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS] + --workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository) + --main-directory $PATH (Path where to access stacks; default=current directory) +``` + ## Directory tree: -For every input organism, a dedicated directory is created. The script will create this directory and all subdirectories required. +For every input organism, a dedicated directory is created with `gga_get_data.py`. The script will create this directory and all subdirectories required. If the user is adding new data to a species (for example adding another strain dataset to the same species), the directory tree will be updated @@ -72,58 +114,18 @@ Directory tree structure: ``` -## Usage: - -The scripts all take one mandatory input file that describes the species and their associated data -(see `examples/example.yml`). Every dataset path in this file must be an absolute path. - -You must also fill in a config file containing sensitive variables (Galaxy and Tripal passwords, etc..) that -the script will read to create the different services and to access the Galaxy container. By default, the config file -inside the repository root will be used if none is precised in the command line. An example of this config file is available -in the `examples` folder. - -- Deploy stacks part: - -```bash -$ python3 /path/to/repo/gga_init.py your_input_file.yml -c/--config your_config_file [-v/--verbose] [OPTIONS] - --main-directory $PATH (Path where to create/update stacks; default=current directory) - --force-traefik (If specified, will overwrite traefik and authelia files; default=False) -``` - -- Copy source data file: - -```bash -$ python3 /path/to/repo/gga_get_data.py your_input_file.yml [-v/--verbose] [OPTIONS] - --main-directory $PATH (Path where to access stacks; default=current directory) -``` - -- Load data in Galaxy library and prepare Galaxy instance: - -```bash -$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file [-v/--verbose] - --main-directory $PATH (Path where to access stacks; default=current directory) -``` - -- Run a workflow in galaxy: - -```bash -$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS] - --workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository) - --main-directory $PATH (Path where to access stacks; default=current directory) -``` - -**The input file and config file have to be the same for all scripts!** - ## Current limitations -When deploying the stack of services, the Galaxy service can take a long time to be ready. This is due to the Galaxy container preparing a persistent location for the container data. This can be bypassed by setting the variable "persist_galaxy_data" to "False" in the config file. - The stacks deployment and the data loading into Galaxy should hence be run separately and only once the Galaxy service is ready. The `gga_load_data.py` script will check that the Galaxy service is ready before loading the data and will exit with a notification if it is not. You can check the status of the Galaxy service with `$ docker service logs -f genus_species_galaxy` or `./serexec genus_species_galaxy supervisorctl status`. +When deploying the stack of services, the Galaxy service can take a long time to be ready. This is due to the Galaxy container preparing a persistent location for the container data. +In development mode only, this can be bypassed by setting the variable "persist_galaxy_data" to "False" in the config file. + + ## Requirements Requires Python 3.6