diff --git a/README.md b/README.md index 677c4433cb261b570656ede6705663e5db07472b..b2647ccccc0494e3b724f6f60264365935adc473 100755 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ # gga_load_data tools The gga_load_data tools allow automated deployment of GMOD visualisation tools (Chado, Tripal, JBrowse, Galaxy) for a bunch of genomes and datasets. -They are based on the Galaxy Genome Annotation (GGA) project (https://galaxy-genome-annotation.github.io). +They are based on the [Galaxy Genome Annotation (GGA) project](https://galaxy-genome-annotation.github.io). -A stack of Docker services is deployed for each organism, from an input yaml file describing the data. +A stack of Docker services is deployed for each species, from an input yaml file describing the data. See `examples/example.yml` for an example of what information can be described and the correct formatting of this input file. -Each GGA environment is deployed at [https://hostname/sp/genus_species/](https://hostname/sp/genus_species/). +A GGA environment is deployed for each different species at [https://hostname/sp/genus_species/](https://hostname/sp/genus_species/). +Multiple strains can have the same species and are deployed in the same GGA environment. ## Requirements @@ -22,7 +23,7 @@ Traefik is a reverse proxy which allows to direct HTTP traffic to various Docker The Traefik dashboard is deployed at [https://hostname/traefik/](https://hostname/traefik/) Authelia is an authentication agent, which can be plugged to an LDAP server, and that Traefik can you to check permissions to access services. -The authentication layer is optional. If used, the config file needs the variables `https_port`, `auth_hostname`, `authelia_config_path`. +The authentication layer is optional. If used, the config file needs the variables `https_port`, `authentication_domain_name`, `authelia_config_path`. Authelia is accessed automatically by Traefik to check permissions everytime someones wants to access a page. If the user is not logged in, he is redirected to the authelia portal. @@ -33,18 +34,17 @@ Note that Authelia needs a secured connexion (no self-signed certificate) betwee The "gga_load_data" tools are composed of 4 scripts: - gga_init: Create directory tree for organisms and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks -- gga_get_data: Create `src_data` directory tree for organisms and copy datasets for the input organisms into the organisms directory tree +- gga_get_data: Create `src_data` directory tree for organisms and copy datasets for the input organisms into `src_data` - gga_load_data: Load the datasets of the input organisms into their Galaxy library -- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to Phaeoexplorer data +- run_workflow_phaeoexplorer: Remotely run a custom workflow in Galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to the [Phaeoexplorer](https://phaeoexplorer.sb-roscoff.fr) data ## Usage: For all scripts one input file is required, that describes the species and their associated data. -(see `examples/example.yml`). Every dataset path in this file must be an absolute path. +(see `examples/citrus_sinensis.yml`). Every dataset path in this file must be an absolute path. Another yaml file is required, the config file, with configuration variables (Galaxy and Tripal passwords, etc..) that -the scripts need to create the different services and to access the Galaxy container. By default, the config file -inside the repository root will be used if none is precised in the command line. An example of this config file is available +the scripts need to create the different services and to access the Galaxy container. An example of this config file is available in the `examples` folder. **The input file and config file have to be the same for all scripts!** @@ -52,7 +52,7 @@ in the `examples` folder. - Deploy stacks part: ```bash -$ python3 /path/to/repo/gga_init.py input_file.yml -c/--config config_file [-v/--verbose] [OPTIONS] +$ python3 /path/to/repo/gga_init.py input_file.yml -c/--config config_file.yml [-v/--verbose] [OPTIONS] --main-directory $PATH (Path where to create/update stacks; default=current directory) --force-traefik (If specified, will overwrite traefik and authelia files; default=False) ``` @@ -67,28 +67,27 @@ $ python3 /path/to/repo/gga_get_data.py input_file.yml [-v/--verbose] [OPTIONS] - Load data in Galaxy library and prepare Galaxy instance: ```bash -$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file [-v/--verbose] +$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file.yml [-v/--verbose] --main-directory $PATH (Path where to access stacks; default=current directory) ``` - Run a workflow in galaxy: ```bash -$ python3 /path/to/repo/gga_load_data.py input_file.yml -c/--config config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS] - --workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository) +$ python3 /path/to/repo/run_workflow_phaeoexplorer.py input_file.yml -c/--config config_file --workflow workflow_type [-v/--verbose] [OPTIONS] + --workflow (Valid options: "chado_load_fasta_gff_jbrowse", "blast", "interpro", preset workflows are available in the "workflows_phaeoexplorer" directory) --main-directory $PATH (Path where to access stacks; default=current directory) ``` -## Limitations - -The stacks deployment and the data loading into Galaxy should be run separately and only once the Galaxy service is ready. -The `gga_load_data.py` script check that the Galaxy service is ready before loading the data and exit with a notification if it is not. +The data loading into Galaxy with `gga_load_data.py` should be run only once the Galaxy service deployed with `gga_init.py` is ready. +The `gga_load_data.py` script checks that the Galaxy service is ready before loading the data and exit with a notification if it is not. The status of the Galaxy service can be checked manually with `$ docker service logs -f genus_species_galaxy` or `./serexec genus_species_galaxy supervisorctl status`. +**Note**: When deploying the stack of services, the Galaxy service can take a long time to be ready, because of the data persistence. -In development mode only, this can be disabled by setting the variable `persist_galaxy_data` to `False` in the config file. +In development mode only, this can be disabled by setting the variable `galaxy_persist_data` to `False` in the config file. ## Directory tree: @@ -149,8 +148,9 @@ Directory tree structure: [BSD 3-Clause](./LICENSE) -## Acknowledgments - -[Anthony Bretaudeau](https://github.com/abretaud) +## Contributors -[Matéo Boudet](https://github.com/mboudet) \ No newline at end of file +- [Matéo Boudet](https://github.com/mboudet) +- [Anthony Bretaudeau](https://github.com/abretaud) +- [Loraine Brillet-Guéguen](https://github.com/loraine-gueguen) +- [Arthur Le Bars](https://gitlab.com/Troubardours) \ No newline at end of file diff --git a/constants.py b/constants.py new file mode 100644 index 0000000000000000000000000000000000000000..9fa387629fe35ec4352e2eedf8b27e912d18a91c --- /dev/null +++ b/constants.py @@ -0,0 +1,53 @@ +# Constants used in the input yaml +ORG_PARAM_NAME = "name" +ORG_PARAM_DESC = "description" +ORG_PARAM_DESC_GENUS = "genus" +ORG_PARAM_DESC_SPECIES = "species" +ORG_PARAM_DESC_SEX = "sex" +ORG_PARAM_DESC_STRAIN = "strain" +ORG_PARAM_DESC_COMMON_NAME = "common_name" +ORG_PARAM_DESC_ORIGIN = "origin" +ORG_PARAM_DESC_MAIN_SPECIES = "main_species" +ORG_PARAM_DATA = "data" +ORG_PARAM_DATA_GENOME_PATH = "genome_path" +ORG_PARAM_DATA_TRANSCRIPTS_PATH = "transcripts_path" +ORG_PARAM_DATA_PROTEINS_PATH = "proteins_path" +ORG_PARAM_DATA_GFF_PATH = "gff_path" +ORG_PARAM_DATA_INTERPRO_PATH = "interpro_path" +ORG_PARAM_DATA_ORTHOFINDER_PATH = "orthofinder_path" +ORG_PARAM_DATA_BLASTP_PATH = "blastp_path" +ORG_PARAM_DATA_BLASTX_PATH = "blastx_path" +ORG_PARAM_DATA_GENOME_VERSION = "genome_version" +ORG_PARAM_DATA_OGS_VERSION = "ogs_version" +ORG_PARAM_DATA_PERFORMED_BY = "performed_by" +ORG_PARAM_SERVICES = "services" +ORG_PARAM_SERVICES_BLAST = "blast" + +# Constants used in the config yaml file +CONF_ALL_HOSTNAME = "hostname" +CONF_ALL_HTTP_PORT = "http_port" +CONF_ALL_HTTPS_PORT = "https_port" +CONF_ALL_PROXY_IP = "proxy_ip" +CONF_ALL_AUTH_DOMAIN_NAME = "authentication_domain_name" +CONF_ALL_AUTHELIA_CONFIG_PATH = "authelia_config_path" +CONF_ALL_AUTHELIA_SECRETS_ENV_PATH = "authelia_secrets_env_path" +CONF_ALL_AUTHELIA_DB_POSTGRES_PASSWORD = "authelia_db_postgres_password" +CONF_GALAXY_DEFAULT_ADMIN_EMAIL = "galaxy_default_admin_email" +CONF_GALAXY_DEFAULT_ADMIN_USER = "galaxy_defaut_admin_user" +CONF_GALAXY_DEFAULT_ADMIN_PASSWORD = "galaxy_default_admin_password" +CONF_GALAXY_CONFIG_REMOTE_USER_MAILDOMAIN = "galaxy_config_remote_user_maildomain" +CONF_GALAXY_PERSIST_DATA = "galaxy_persist_data" +CONF_TRIPAL_PASSWORD = "tripal_password" +CONF_TRIPAL_BANNER_PATH = "tripal_banner_path" +CONF_TRIPAL_THEME_NAME = "tripal_theme_name" +CONF_TRIPAL_THEME_GIT_CLONE = "tripal_theme_git_clone" +CONF_JBROWSE_MENU_URL = "jbrowse_menu_url" + +# default config file +DEFAULT_CONFIG = "examples/config" + +GET_ORGANISMS_TOOL = "toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0" +DELETE_ORGANISMS_TOOL = "toolshed.g2.bx.psu.edu/repos/gga/chado_organism_delete_organisms/organism_delete_organisms/2.3.4+galaxy0" + +HOST_DATA_DIR='src_data' +CONTAINER_DATA_DIR_ROOT='/project_data' diff --git a/examples/authelia_config_example.yml b/examples/authelia_config.yml similarity index 97% rename from examples/authelia_config_example.yml rename to examples/authelia_config.yml index 1050c78eb53f58547efdb3b4d5a891c4df708242..8ce2b710d4844c77ed1e72fe14bab9b9db840726 100644 --- a/examples/authelia_config_example.yml +++ b/examples/authelia_config.yml @@ -16,7 +16,7 @@ log_level: info # The secret used to generate JWT tokens when validating user identity by # email confirmation. # This secret can also be set using the env variables AUTHELIA_JWT_SECRET -jwt_secret: XXXXXXXXXXXXXXXXX +#jwt_secret: XXXXXXXXXXXXXXXXX # Default redirection URL # @@ -82,7 +82,7 @@ authentication_backend: # skip_verify: false # The base dn for every entries -# base_dn: dc=genouest,dc=org +# base_dn: dc=domain,dc=org # The attribute holding the username of the user. This attribute is used to populate # the username in the session information. It was introduced due to #561 to handle case @@ -196,7 +196,7 @@ access_control: # Default policy can either be 'bypass', 'one_factor', 'two_factor' or 'deny'. # It is the policy applied to any resource if there is no policy to be applied # to the user. - default_policy: bypass + default_policy: deny rules: # The login portal is freely accessible (redirectino loop otherwise) @@ -213,12 +213,9 @@ access_control: - domain: localhost resources: - "^/traefik/.*$" + - "^/api/.*$" policy: one_factor subject: "group:ldap_admin" - - domain: localhost - resources: - - "^/traefik/.*$" - policy: deny # All galaxies are restricted to a group from ldap - domain: localhost @@ -237,22 +234,17 @@ access_control: - "^/sp/genus_species/.*$" policy: one_factor subject: "group:gspecies" - - domain: localhost - resources: - - "^/sp/genus_species/.*$" - policy: deny - # Configuration of session cookies # # The session cookies identify the user once logged in. session: # The name of the session cookie. (default: authelia_session). - name: authelia_replaceme_session + name: authelia_session # The secret to encrypt the session data. This is only used with Redis. # This secret can also be set using the env variables AUTHELIA_SESSION_SECRET - secret: WXXXXXXXXXXXXXXXXXXXcXXXXXXXXXXXXXX +# secret: WXXXXXXXXXXXXXXXXXXXcXXXXXXXXXXXXXX # The time in seconds before the cookie expires and session is reset. expiration: 3600000 # 1000 hour @@ -271,7 +263,7 @@ session: # The domain to protect. # Note: the authenticator must also be in that domain. If empty, the cookie # is restricted to the subdomain of the issuer. - domain: replaceme.org + domain: domain.org # The redis connection details redis: @@ -342,7 +334,7 @@ notifier: host: smtp-server-hostname port: 25 disable_require_tls: true - sender: replace@me.fr + sender: replace@domain.org # Sending an email using a Gmail account is as simple as the next section. # You need to create an app password by following: https://support.google.com/accounts/answer/185833?hl=en diff --git a/examples/authelia_secrets.env b/examples/authelia_secrets.env new file mode 100644 index 0000000000000000000000000000000000000000..25485beada267503795b0d1ac875adf45069a6fd --- /dev/null +++ b/examples/authelia_secrets.env @@ -0,0 +1,3 @@ +AUTHELIA_AUTHENTICATION_BACKEND_LDAP_PASSWORD=xxxxxxx +AUTHELIA_JWT_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +AUTHELIA_SESSION_SECRET=xxxxxxxxxxxxxxxxx diff --git a/examples/citrus_sinensis.yml b/examples/citrus_sinensis.yml new file mode 100644 index 0000000000000000000000000000000000000000..63483e4e70dde45425a68185cf51ef18587ef403 --- /dev/null +++ b/examples/citrus_sinensis.yml @@ -0,0 +1,38 @@ +# Input file for the automated creation GGA docker stacks +# The file consists in a "list" of species for which the script will have to create these stacks/load data into galaxy/run workflows + +- name: citrus_sinensis + description: + # Species description, leave blank if unknown or you don't want it to be used + # These parameters are used to set up the various urls and adresses in different containers + # The script requires at least the genus to be specified + genus: Citrus # Mandatory! + species: sinensis # Mandatory! + sex: male + strain: + common_name: + origin: + # Useful when there are multiple strains for the same species: set to "yes" to define the strain used as home for JBrowse + # main_species: yes + data: + # Paths to the different datasets to copy and import into the galaxy container (as a shared library) + # Must be absolute paths to the dataset + genome_path: /path/to/repo/examples/src_data/genome/v1.0/Citrus_sinensis-scaffold00001.fasta # Mandatory! + transcripts_path: /path/to/repo/examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.fasta # Mandatory! + proteins_path: # Mandatory! + gff_path: /path/to/repo/examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.gff3 # Mandatory! + interpro_path: /path/to/repo/examples/src_data/annotation/v1.0/functional_annotation/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml + orthofinder_path: + blastp_path: + blastx_path: /path/to/repo/examples/src_data/annotation/v1.0/functional_annotation/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out + # If the user has several datasets of the same 'nature' (gff, genomes, ...) to upload to galaxy, the next scalar is used by the script to differentiate + # between these different versions and name directories according to it and not overwrite the existing data + # If left empty, the genome will be considered version "1.0" + genome_version: 1.0 + # Same as genome version, but for the OGS analysis + ogs_version: 1.0 + performed_by: + services: + # List the optional services to be deploy in the stack + # By default, only tripal, tripaldb, galaxy, jbrowse and elasticsearch services will be deployed + blast: 0 \ No newline at end of file diff --git a/examples/config.yml b/examples/config.yml new file mode 100644 index 0000000000000000000000000000000000000000..cb0873228743b5412908e149901882407f709000 --- /dev/null +++ b/examples/config.yml @@ -0,0 +1,27 @@ +# This is the configuration template file used by the gga_init.py, gga_load_data.py and run_workflow.py scripts + +# These variables are used by several services at once or the paths to import sensitive files +hostname: localhost # Required. The hosting machine name +http_port: 8888 # Required. The HTTP port docker will use on the hosting machine +https_port: 8889 # Required for Authelia. The HTTPS port docker will use on the hosting machine +proxy_ip: XXX.XXX.XXX.XXX # Required. IP of the upstream proxy (used by Traefik) +authentication_domain_name: XXXXXXXX # Required for Authelia. The authentication domain name. +authelia_config_path: /path/to/authelia_config.yml # Required for Authelia. Path to the Authelia configuration file +authelia_secrets_env_path: /path/to/authelia/secrets.env # Required for Authelia. Path to the env file containing passwords and secrets needed for Authelia +authelia_db_postgres_password: XXXXXXXX # Required for Authelia. + +# galaxy-specific variables +galaxy_default_admin_email: gga@galaxy.org # Required +galaxy_defaut_admin_user: gga # Required +galaxy_default_admin_password: password # Required +galaxy_config_remote_user_maildomain: mydomain.com # Required. The maildomain used by Galaxy authentication +galaxy_persist_data: "True" # Optional (default: True). If False, docker data will NOT be persisted on your host's file system and will be lost any time the galaxy container is recreated. Do not set this variable to "False" for production + +# tripal-specific variables +tripal_password: tripalpass # Required. Tripal database password (also used by galaxy as an environment variable) +tripal_banner_path: /path/to/banner.png # Optional. Use this to change the top banner in Tripal +tripal_theme_name: tripal_gga # Optional. Use this to use another theme +tripal_theme_git_clone: http://gitlab.sb-roscoff.fr/abims/e-infra/tripal_gga.git # Optional. Use this to install another theme. + +# jbrowse-specific variables +jbrowse_menu_url: "http://localhost:8888/" # Optional. Used with run_workflow_phaeoexplorer.py: if present, this variable is used to define JBrowse menu_url (to define the template url for the JBrowse feature's link to Tripal), if absent, will use default "https://hostname" \ No newline at end of file diff --git a/examples/config_example.yml b/examples/config_example.yml deleted file mode 100644 index 795cd662b0934475260f0dc88a6556399bd93929..0000000000000000000000000000000000000000 --- a/examples/config_example.yml +++ /dev/null @@ -1,29 +0,0 @@ -# This is the configuration template file used by the gga_init.py, gga_load_data.py and run_workflow.py scripts - -# "all" section contains variables used by several services at once or the paths to import sensitive files -all: - hostname: localhost # Required. The hosting machine name - dashboard_port: 8001 # Required. The desired port (on the hosting machine) for the traefik container dashboard - http_port: 8888 # Required. The HTTP port docker will use on the hosting machine - https_port: 8889 # Required for Authelia. The HTTPS port docker will use on the hosting machine - proxy_ip: XXXXXXXXXXXX # Required. IP of the upstream proxy (used by Traefik) - authentication_domain_name: XXXXXXXXXXXX # Required for Authelia. The authentication domain name. - authelia_config_path: /path/to/authelia_config.yml # Required for Authelia. Path to the Authelia configuration file -# galaxy-specific variables -galaxy: - galaxy_default_admin_email: gga@galaxy.org # Required - galaxy_defaut_admin_user: gga # Required - galaxy_default_admin_password: password # Required - webapollo_user: admin_apollo@galaxy.org # Required - webapollo_password: apollopass # Required - galaxy_config_remote_user_maildomain: mydomain.com # Required. The maildomain used by Galaxy authentication - persist_galaxy_data: "True" # # Optional (default: True). If False, docker data will NOT be persisted on your host's file system and will be lost any time the galaxy container is recreated. Do not set this variable to "False" for production -# tripal-specific variables -tripal: - tripal_password: tripalpass # Required. Tripal database password (also used by galaxy as an environment variable) - banner_path: /my/path/banner.png # Optional. Custom banner path - tripal_theme_name: tripal_gga # Optional. Use this to use another theme - tripal_theme_git_clone: http://gitlab.sb-roscoff.fr/abims/e-infra/tripal_gga.git # Optional. Use this to install another theme. -# jbrowse-specific variables -jbrowse: - menu_url: "http://localhost:8888/" # Optional. Used with run_workflow_phaeoexplorer.py: if present, this variable is used to define JBrowse menu_url (to define the template url for the JBrowse feature's link to Tripal), if absent, will use default "https://hostname" \ No newline at end of file diff --git a/examples/example.yml b/examples/example.yml deleted file mode 100644 index 39f05f0896136d60ee020ae55627dc36b4e51bc4..0000000000000000000000000000000000000000 --- a/examples/example.yml +++ /dev/null @@ -1,42 +0,0 @@ -# Input file for the automated creation GGA docker stacks -# The file consists in a "list" of species for which the script will have to create these stacks/load data into galaxy/run workflows -# This file is internally turned into a list of dictionaries by the scripts - -citrus_sinensis: # Dummy value to designate the species (isn't used by the script) - description: - # Species description, leave blank if unknown or you don't want it to be used - # These parameters are used to set up the various urls and adresses in different containers - # The script requires at least the genus to be specified - genus: "Citrus" # Mandatory! - species: "sinensis" # Mandatory! - sex: "male" - strain: "" - common_name: "" - origin: "" - # the sex and strain, the script will look for files containing the genus, species, sex and strain of the species) - # If no file corresponding to the description is found, this path will be considered empty and the script will - # proceed to the next step (create the directory tree for the GGA docker stack) - data: - # Sequence of paths to the different datasets to copy and import into the galaxy container (as a shared library) - # Must be absolute paths to the dataset - genome_path: "/path/to/repo/examples/src_data/genome/v1.0/Citrus_sinensis-scaffold00001.fasta" # Mandatory! - transcripts_path: "/path/to/repo/examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.fasta" # Mandatory! - proteins_path: "" # Mandatory! - gff_path: "/path/to/repo/examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.gff3" # Mandatory! - interpro_path: "/path/to/repo/examples/src_data/annotation/v1.0/functional_annotation/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml" - orthofinder_path: "" - blastp_path: "" - blastx_path: "/path/to/repo/examples/src_data/annotation/v1.0/functional_annotation/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out" - # If the user has several datasets of the same 'nature' (gff, genomes, ...) to upload to galaxy, the next scalar is used by the script to differentiate - # between these different versions and name directories according to it and not overwrite the existing data - # If left empty, the genome will be considered version "1.0" - genome_version: "1.0" - # Same as genome version, but for the OGS analysis - ogs_version: "1.0" - performed_by: "" - services: - # Describe what optional services to deploy for the stack - # By default, only tripal, tripaldb and galaxy services will be deployed - blast: "False" - wiki: "False" - apollo: "False" \ No newline at end of file diff --git a/gga_get_data.py b/gga_get_data.py index 6ff524b29333ee5470ce146b54976e6e3d448c5a..992e5c6c46f59d0bfdd2cecf14f99d3be21f0a95 100755 --- a/gga_get_data.py +++ b/gga_get_data.py @@ -1,24 +1,16 @@ #!/usr/bin/env python3 # -*- coding: utf-8 -*- -import bioblend import argparse import os -import subprocess import logging import sys -import fnmatch import time -import json -import re -import stat import shutil -from bioblend.galaxy.objects import GalaxyInstance -from bioblend import galaxy - import utilities import speciesData +import constants """ gga_get_data.py @@ -36,24 +28,6 @@ class GetData(speciesData.SpeciesData): """ - def goto_species_dir(self): - """ - Go to the species directory (starting from the main dir) - - :return: - """ - - os.chdir(self.main_dir) - species_dir = os.path.join(self.main_dir, self.genus_species) + "/" - try: - os.chdir(species_dir) - except OSError: - logging.critical("Cannot access %s" % species_dir) - sys.exit(0) - return 1 - - - def make_directory_tree(self): """ Generate the directory tree for an organism @@ -91,6 +65,12 @@ class GetData(speciesData.SpeciesData): logging.info("src_data directory tree generated for %s" % self.full_name) + def get_last_modified_time_string(self, filePath): + # give the last modification date for the file, with format '20190130' + lastModifiedTimestamp = os.path.getmtime(filePath) + lastModifiedTimeStructure = time.localtime(lastModifiedTimestamp) + lastModifiedDate = time.strftime("%Y%m%d", lastModifiedTimeStructure) + return lastModifiedDate def get_source_data_files_from_path(self): """ @@ -108,55 +88,47 @@ class GetData(speciesData.SpeciesData): organism_annotation_dir = os.path.abspath("./src_data/annotation/{0}/OGS{1}".format(self.species_folder_name, self.ogs_version)) organism_genome_dir = os.path.abspath("./src_data/genome/{0}/v{1}".format(self.species_folder_name, self.genome_version)) - datasets_to_get = {"genome_path": self.genome_path, - "gff_path": self.gff_path, - "transcripts_path": self.transcripts_path, - "proteins_path": self.proteins_path, - "interpro_path": self.interpro_path, - "orthofinder_path": self.orthofinder_path, - "blastp_path": self.blastp_path, - "blastx_path": self.blastx_path} - - genome_datasets = ["genome_path"] - annotation_datasets = ["gff_path", "transcripts_path", "proteins_path", "orthofinder_path", "interpro_path", "blastp_path", "blastx_path"] - # Where to store blast results? - - # search_excluded_datasets = ["interpro_path", "orthofinder_path", "blastp_path", "blastx_path"] - # # These datasets will not be searched if missing in the input file + genome_datasets = {constants.ORG_PARAM_DATA_GENOME_PATH: self.genome_path} + annotation_datasets = {constants.ORG_PARAM_DATA_GFF_PATH: self.gff_path, + constants.ORG_PARAM_DATA_TRANSCRIPTS_PATH: self.transcripts_path, + constants.ORG_PARAM_DATA_PROTEINS_PATH: self.proteins_path, + constants.ORG_PARAM_DATA_INTERPRO_PATH: self.interpro_path, + constants.ORG_PARAM_DATA_ORTHOFINDER_PATH: self.orthofinder_path, + constants.ORG_PARAM_DATA_BLASTP_PATH: self.blastp_path, + constants.ORG_PARAM_DATA_BLASTX_PATH: self.blastx_path} # Copy datasets in the organism src_data dir tree correct folder - for k, v in datasets_to_get.items(): + for k, v in genome_datasets.items(): if v: # If dataset is not present in input file, skip copy - if k in genome_datasets: - logging.info("Copying {0} ({1}) into {2}".format(k, v, organism_genome_dir)) - genome_fname = "v%s.fasta" % self.genome_version - try: - shutil.copyfile(os.path.abspath(v), os.path.join(organism_genome_dir, genome_fname)) - except Exception as exc: - logging.warning("Could not copy {0} ({1}) - Exit Code: {2})".format(k, v, exc)) - elif k in annotation_datasets: - dataset_fname = "" - if k == "gff_path": - dataset_fname = "OGS%s.gff" % self.ogs_version - elif k == "transcripts_path": - dataset_fname = "OGS%s_transcripts.fasta" % self.ogs_version - elif k == "proteins_path": - dataset_fname = "OGS%s_proteins.fasta" % self.ogs_version - elif k == "orthofinder_path": - dataset_fname = "OGS%s_orthofinder.tsv" % self.ogs_version - elif k == "interpro_path": - dataset_fname = "OGS%s_interproscan.xml" % self.ogs_version - elif k == "blastp_path": - dataset_fname = "OGS%s_blastp.xml" % self.ogs_version - elif k == "blastx_path": - dataset_fname = "OGS%s_blastx.xml" % self.ogs_version - logging.info("Copying {0} ({1}) into {2}".format(k, v, organism_annotation_dir)) - try: - shutil.copyfile(os.path.abspath(v), os.path.join(organism_annotation_dir, dataset_fname)) - except Exception as exc: - logging.warning("Could not copy {0} ({1}) - Exit Code: {2}".format(k, v, exc)) - else: - pass + logging.info("Copying {0} ({1}) into {2}".format(k, v, organism_genome_dir)) + genome_fname = "{0}_v{1}.fasta".format(self.dataset_prefix, self.genome_version) + try: + shutil.copyfile(os.path.abspath(v), os.path.join(organism_genome_dir, genome_fname)) + except Exception as exc: + logging.warning("Could not copy {0} ({1}) - Exit Code: {2})".format(k, v, exc)) + + for k, v in annotation_datasets.items(): + if v: # If dataset is not present in input file, skip copy + dataset_fname = "" + if k == constants.ORG_PARAM_DATA_GFF_PATH: + dataset_fname = "{0}_OGS{1}_{2}.gff".format(self.dataset_prefix, self.ogs_version, self.get_last_modified_time_string(os.path.abspath(v))) + elif k == constants.ORG_PARAM_DATA_TRANSCRIPTS_PATH: + dataset_fname = "{0}_OGS{1}_transcripts.fasta".format(self.dataset_prefix, self.ogs_version) + elif k == constants.ORG_PARAM_DATA_PROTEINS_PATH: + dataset_fname = "{0}_OGS{1}_proteins.fasta".format(self.dataset_prefix, self.ogs_version) + elif k == constants.ORG_PARAM_DATA_ORTHOFINDER_PATH: + dataset_fname = "{0}_OGS{1}_orthofinder.tsv".format(self.dataset_prefix, self.ogs_version) + elif k == constants.ORG_PARAM_DATA_INTERPRO_PATH: + dataset_fname = "{0}_OGS{1}_interproscan.xml".format(self.dataset_prefix, self.ogs_version) + elif k == constants.ORG_PARAM_DATA_BLASTP_PATH: + dataset_fname = "{0}_OGS{1}_blastp.xml".format(self.dataset_prefix, self.ogs_version) + elif k == constants.ORG_PARAM_DATA_BLASTX_PATH: + dataset_fname = "{0}_OGS{1}_blastx.xml".format(self.dataset_prefix, self.ogs_version) + logging.info("Copying {0} ({1}) into {2}".format(k, v, organism_annotation_dir)) + try: + shutil.copyfile(os.path.abspath(v), os.path.join(organism_annotation_dir, dataset_fname)) + except Exception as exc: + logging.warning("Could not copy {0} ({1}) - Exit Code: {2}".format(k, v, exc)) os.chdir(self.main_dir) @@ -183,10 +155,7 @@ def make_dirs(dir_paths_li): return created_dir_paths_li if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Automatic data loading in containers and interaction " - "with galaxy instances for GGA" - ", following the protocol @ " - "http://gitlab.sb-roscoff.fr/abims/e-infra/gga") + parser = argparse.ArgumentParser(description="Create 'src_data' tree and add data files") parser.add_argument("input", type=str, @@ -194,7 +163,7 @@ if __name__ == "__main__": parser.add_argument("-v", "--verbose", help="Increase output verbosity", - action="store_false") + action="store_true") parser.add_argument("--main-directory", type=str, @@ -206,7 +175,6 @@ if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) else: logging.basicConfig(level=logging.INFO) - logging.getLogger("urllib3").setLevel(logging.WARNING) if not args.main_directory: args.main_directory = os.getcwd() @@ -234,4 +202,3 @@ if __name__ == "__main__": logging.info("Finding and copying datasets for %s" % get_data_for_current_species.full_name) get_data_for_current_species.get_source_data_files_from_path() logging.info("Sucessfully copied datasets for %s" % get_data_for_current_species.full_name) - diff --git a/gga_init.py b/gga_init.py index be5f3c3967accd1c5df1931145fcc4af4b14aaaf..879fda83449178123873d13dcaefbb4e7ebcf0e0 100755 --- a/gga_init.py +++ b/gga_init.py @@ -9,12 +9,13 @@ import logging import sys import yaml import shutil - from pathlib import Path -from jinja2 import Template, Environment, FileSystemLoader +from jinja2 import Environment, FileSystemLoader import utilities import speciesData +import constants + """ gga_init.py @@ -60,34 +61,31 @@ class DeploySpeciesStack(speciesData.SpeciesData): # Copy the custom banner to the species dir (banner used in tripal pages) # If the path specified is invalid (because it's empty or is still the default demo one), # use the default banner instead - if "banner_path" in self.config.keys(): - if self.config["banner_path"] != "/path/to/banner" or self.config["banner_path"] != "": - try: - logging.debug("Custom banner path: %s" % self.config["banner_path"]) - if os.path.isfile(os.path.abspath(self.config["banner_path"])): - shutil.copy(os.path.abspath(self.config["banner_path"]), "%s/banner.png" % self.species_dir) - except FileNotFoundError: - logging.warning("Specified banner not found (%s), using default banner instead" % self.config["banner_path"]) - self.config.pop("banner_path", None) + if constants.CONF_TRIPAL_BANNER_PATH in self.config.keys(): + if not config[constants.CONF_TRIPAL_BANNER_PATH] == "" and os.path.isfile(os.path.abspath(config[constants.CONF_TRIPAL_BANNER_PATH])): + banner_dest_path = os.path.join(self.species_dir, os.path.abspath("banner.png")) + if not os.path.isfile(banner_dest_path) and not os.path.islink(banner_dest_path) and not os.path.samefile(os.path.abspath(config[constants.CONF_TRIPAL_BANNER_PATH]),banner_dest_path): + os.symlink(os.path.abspath(self.config[constants.CONF_TRIPAL_BANNER_PATH]), banner_dest_path) + logging.info("Custom banner added: symlink from %s" % self.config[constants.CONF_TRIPAL_BANNER_PATH]) else: - logging.debug("Using default banner for Tripal pages") - self.config.pop("banner_path", None) + logging.debug("Using default banner for Tripal pages because %s is not valid in 'config' file" % constants.CONF_TRIPAL_BANNER_PATH) + self.config.pop(constants.CONF_TRIPAL_BANNER_PATH, None) else: logging.debug("Using default banner for Tripal pages") - self.config.pop("banner_path", None) + self.config.pop(constants.CONF_TRIPAL_BANNER_PATH, None) # Create nginx dirs and write/re-write nginx conf make_dirs(dir_paths_li=["./nginx", "./nginx/conf"]) try: shutil.copy(os.path.join(self.script_dir, "files/nginx_download.conf"), os.path.abspath("./nginx/conf/default.conf")) except Exception as exc: - logging.critical("Could not copy nginx configuration file for %s" % self.full_name) + logging.critical("Could not copy nginx configuration file for %s %s", self.genus, self.species) logging.critical(exc) # Return to main directory os.chdir(self.main_dir) - logging.info("Directory tree generated for %s" % self.full_name) + logging.info("Directory tree generated for %s %s", self.genus, self.species) def make_compose_files(self): @@ -114,25 +112,30 @@ class DeploySpeciesStack(speciesData.SpeciesData): input_vars = {"genus": self.genus_lowercase, "Genus": self.genus_uppercase, "species": self.species, "genus_species": self.genus_species, "genus_species_strain_sex": self.species_folder_name, "genus_species_sex": "{0}_{1}_{2}".format(self.genus_lowercase, self.species.lower(), self.sex), - "strain": self.strain, "sex": self.sex, "Genus_species": self.genus_species[0].upper() + self.genus_species[1:]} + "strain": self.strain, "sex": self.sex, "Genus_species": self.genus_species[0].upper() + self.genus_species[1:], + "blast": self.blast} + if (len(self.config.keys()) == 0): + logging.error("Empty config dictionary") # Merge the two dicts render_vars = {**self.config, **input_vars} # Render the gspecies docker-compose file and write it - gspecies_compose_template = env.get_template("gspecies_compose_template.yml.j2") + gspecies_compose_template = env.get_template("gspecies_compose.yml.j2") gspecies_compose_output = gspecies_compose_template.render(render_vars) with open(os.path.join(self.species_dir, "docker-compose.yml"), "w") as gspecies_compose_file: logging.info("Writing %s docker-compose.yml" % self.genus_species) gspecies_compose_file.truncate(0) gspecies_compose_file.write(gspecies_compose_output) - - galaxy_nginx_conf_template = env.get_template("galaxy_nginx.conf.j2") - galaxy_nginx_conf_output = galaxy_nginx_conf_template.render(render_vars) - with open(os.path.join(self.main_dir, "galaxy_nginx.conf"), "w") as galaxy_nginx_conf_file: - logging.debug("Writing the galaxy_nginx.conf file for %s" % self.genus_species) - galaxy_nginx_conf_file.truncate(0) - galaxy_nginx_conf_file.write(galaxy_nginx_conf_output) + if not os.path.isfile(os.path.join(self.main_dir, "galaxy_nginx.conf")): + galaxy_nginx_conf_template = env.get_template("galaxy_nginx.conf.j2") + galaxy_nginx_conf_output = galaxy_nginx_conf_template.render(render_vars) + with open(os.path.join(self.main_dir, "galaxy_nginx.conf"), "w") as galaxy_nginx_conf_file: + logging.debug("Writing the galaxy_nginx.conf file for %s" % self.genus_species) + galaxy_nginx_conf_file.truncate(0) + galaxy_nginx_conf_file.write(galaxy_nginx_conf_output) + else: + logging.debug("galaxy_nginx.conf already exists") # Create the volumes (directory) of the species docker-compose file create_mounts(working_dir=".", main_dir=self.main_dir) @@ -199,42 +202,36 @@ def make_traefik_compose_files(config, main_dir): # Jinja2 templating, handled using the python "jinja2" module file_loader = FileSystemLoader(script_dir + "/templates") - env = Environment(loader=file_loader) + env = Environment(loader=file_loader, trim_blocks=True, lstrip_blocks=True) if not os.path.isfile("./traefik/docker-compose.yml"): - traefik_compose_template = env.get_template("traefik_compose_template.yml.j2") + traefik_compose_template = env.get_template("traefik_compose.yml.j2") traefik_compose_output = traefik_compose_template.render(render_vars) with open(os.path.join(main_dir, "traefik/docker-compose.yml"), 'w') as traefik_compose_file: logging.info("Writing traefik docker-compose.yml") traefik_compose_file.truncate(0) traefik_compose_file.write(traefik_compose_output) - if "authelia_config_path" in config.keys(): - if not config["authelia_config_path"] == "" or not config["authelia_config_path"] == "/path/to/authelia/config": - if os.path.isfile(os.path.abspath(config["authelia_config_path"])): + if constants.CONF_ALL_HTTPS_PORT in config.keys(): + logging.info("HTTPS mode (with Authelia)") + if constants.CONF_ALL_AUTHELIA_CONFIG_PATH in config.keys(): + if not config[constants.CONF_ALL_AUTHELIA_CONFIG_PATH] == "" and os.path.isfile(os.path.abspath(config[constants.CONF_ALL_AUTHELIA_CONFIG_PATH])): try: - shutil.copy(os.path.abspath(config["authelia_config_path"]), "./traefik/authelia/configuration.yml") + shutil.copy(os.path.abspath(config[constants.CONF_ALL_AUTHELIA_CONFIG_PATH]), "./traefik/authelia/configuration.yml") except Exception as exc: logging.critical("Could not copy authelia configuration file") sys.exit(exc) - # authelia_config_template = env.get_template(os.path.basename(config["authelia_config_path"])) - # authelia_config_output = authelia_config_template.render(render_vars) - # with open(os.path.join(main_dir, "traefik/authelia/configuration.yml"), 'w') as authelia_config_file: - # logging.info("Writing authelia configuration.yml") - # authelia_config_file.truncate(0) - # authelia_config_file.write(authelia_config_output) else: - logging.critical("Cannot find authelia configuration path (%s)" % config["authelia_config_path"]) + logging.critical("Invalid authelia configuration path (%s)" % config[constants.CONF_ALL_AUTHELIA_CONFIG_PATH]) sys.exit() - else: - logging.critical("Invalid authelia configuration path (%s)" % config["authelia_config_path"]) - sys.exit() - # Path to the authelia users in the repo - authelia_users_path = script_dir + "/files/authelia_users.yml" - # Copy authelia "users" file - if not os.path.isfile("./traefik/authelia/users.yml"): - shutil.copy(authelia_users_path, "./traefik/authelia/users.yml") + # Path to the authelia users in the repo + authelia_users_path = script_dir + "/files/authelia_users.yml" + # Copy authelia "users" file + if not os.path.isfile("./traefik/authelia/users.yml"): + shutil.copy(authelia_users_path, "./traefik/authelia/users.yml") + else: + logging.info("HTTP mode (without Authelia)") # Create the mounts for the traefik and authelia services traefik_dir = os.path.abspath(os.path.join(main_dir, "traefik")) @@ -244,7 +241,6 @@ def make_traefik_compose_files(config, main_dir): # Return to main directory os.chdir(main_dir) - def create_mounts(working_dir, main_dir): """ Create the folders (volumes) required by a container (to see required volumes, check their compose file) @@ -295,6 +291,12 @@ def create_mounts(working_dir, main_dir): logging.critical("Cannot access %s, exiting" % main_dir) sys.exit(exc) +def run_command(command, working_dir): + subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=working_dir) + +def run_docker_stack_deploy(service, working_dir): + run_command(["docker", "stack", "deploy", "-c", "./docker-compose.yml", service], working_dir) + def deploy_stacks(input_list, main_dir, deploy_traefik): """ This function first deploys/redeploys the traefik stack, then deploys/redeploys the organism stack, then redeploys the traefik stack @@ -304,7 +306,7 @@ def deploy_stacks(input_list, main_dir, deploy_traefik): """ main_dir = os.path.abspath(main_dir) - os.chdir(main_dir) + traefik_dir = os.path.join(main_dir, "traefik") # Get species for which to deploy the stacks # Uses the get_unique_species_list method from utilities to deploy a stack only for the "species" level (i.e genus_species) @@ -313,38 +315,26 @@ def deploy_stacks(input_list, main_dir, deploy_traefik): if deploy_traefik: # Create the swarm cluster if needed logging.info("Initializing docker swarm (adding node)") - subprocess.call(["docker", "swarm", "init"], - stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=main_dir) + run_command(["docker", "swarm", "init"], main_dir) # Deploy traefik stack logging.info("Deploying traefik stack") - os.chdir("./traefik") - subprocess.call(["docker", "stack", "deploy", "-c", "./docker-compose.yml", "traefik"], - stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=".") - os.chdir(main_dir) + run_docker_stack_deploy("traefik", traefik_dir) # Deploy individual species stacks for sp in to_deploy_species_li: - os.chdir(sp) + sp_dir = os.path.join(main_dir, sp) logging.info("Deploying %s stack" % sp) - subprocess.call(["docker", "stack", "deploy", "-c", "./docker-compose.yml", "{0}_{1}".format(sp.split("_")[0], sp.split("_")[1])], - stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=".") + run_docker_stack_deploy("{0}_{1}".format(sp.split("_")[0], sp.split("_")[1]), sp_dir) logging.info("Deployed %s stack" % sp) - os.chdir(main_dir) # Update traefik stack logging.info("Updating traefik stack") - os.chdir("./traefik") - subprocess.call(["docker", "stack", "deploy", "-c", "./docker-compose.yml", "traefik"], - stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=".") - os.chdir(main_dir) + run_docker_stack_deploy("traefik", traefik_dir) if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Automatic data loading in containers and interaction " - "with galaxy instances for GGA" - ", following the protocol @ " - "http://gitlab.sb-roscoff.fr/abims/e-infra/gga") + parser = argparse.ArgumentParser(description="Deploy GGA containers") parser.add_argument("input", type=str, @@ -356,7 +346,7 @@ if __name__ == "__main__": parser.add_argument("--config", type=str, - help="Config path, default to the 'config' file inside the script repository") + help="Config path, default to 'examples/config.yml'") parser.add_argument("--main-directory", type=str, @@ -374,10 +364,13 @@ if __name__ == "__main__": logging.basicConfig(level=logging.INFO) # Parsing the config file if provided, using the default config otherwise - if not args.config: - args.config = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), "config") + if args.config: + config_file = os.path.abspath(args.config) else: - args.config = os.path.abspath(args.config) + config_file = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), constants.DEFAULT_CONFIG) + config = utilities.parse_config(config_file) + if (len(config.keys()) == 0): + logging.error("Empty config dictionary") main_dir = None if not args.main_directory: @@ -389,7 +382,6 @@ if __name__ == "__main__": # Create traefik directory and compose files if needed or specified if args.force_traefik or not os.path.isdir(os.path.join(os.path.abspath(main_dir), "traefik")): - config = utilities.parse_config(args.config) make_traefik_compose_files(config=config, main_dir=main_dir) unique_sp_dict_list = utilities.get_unique_species_dict_list(sp_dict_list=sp_dict_list) @@ -407,31 +399,18 @@ if __name__ == "__main__": "/") # Parse the config yaml file - deploy_stack_for_current_organism.config = utilities.parse_config(args.config) - - # Set the instance url attribute - for env_variable, value in deploy_stack_for_current_organism.config.items(): - if env_variable == "hostname": - deploy_stack_for_current_organism.instance_url = value + \ - deploy_stack_for_current_organism.genus_lowercase + \ - "_" + deploy_stack_for_current_organism.species + \ - "/galaxy/" - break - else: - deploy_stack_for_current_organism.instance_url = "http://localhost:8888/sp/{0}_{1}/galaxy/".format( - deploy_stack_for_current_organism.genus_lowercase, - deploy_stack_for_current_organism.species) - + deploy_stack_for_current_organism.config = config + # Starting - logging.info("gga_init.py called for %s" % deploy_stack_for_current_organism.full_name) + logging.info("gga_init.py called for %s %s", deploy_stack_for_current_organism.genus, deploy_stack_for_current_organism.species) # Make/update directory tree deploy_stack_for_current_organism.make_directory_tree() - logging.info("Successfully generated the directory tree for %s" % deploy_stack_for_current_organism.full_name) + logging.info("Successfully generated the directory tree for %s %s", deploy_stack_for_current_organism.genus, deploy_stack_for_current_organism.species) # Make compose files deploy_stack_for_current_organism.make_compose_files() - logging.info("Successfully generated the docker-compose files for %s" % deploy_stack_for_current_organism.full_name) + logging.info("Successfully generated the docker-compose files for %s %s", deploy_stack_for_current_organism.genus, deploy_stack_for_current_organism.species) logging.info("Deploying stacks") if args.force_traefik: diff --git a/gga_load_data.py b/gga_load_data.py index 856d0435c2319bd7575b81089da7586159702f73..74634d6d5f9d158a39ff01a30df82d8b45336bc5 100755 --- a/gga_load_data.py +++ b/gga_load_data.py @@ -1,34 +1,31 @@ #!/usr/bin/env python3 # -*- coding: utf-8 -*- +import re import bioblend import argparse import os -import subprocess import logging import sys -import fnmatch import time import json -import re -import stat -import shutil - -from bioblend.galaxy.objects import GalaxyInstance +import yaml +import subprocess from bioblend import galaxy +from bioblend.galaxy.objects import GalaxyInstance import utilities import speciesData +import constants """ gga_load_data.py -Usage: $ python3 gga_init.py -i input_example.yml --config config.yml [OPTIONS] +Usage: $ python3 gga_load_data.py -i input_example.yml --config config.yml [OPTIONS] Do not call this script before the galaxy container is ready """ - class LoadData(speciesData.SpeciesData): """ Child of SpeciesData @@ -38,41 +35,29 @@ class LoadData(speciesData.SpeciesData): Optional data file formatting """ + def __init__(self, parameters_dictionary): + self.existing_folders_cache = {} + self.bam_metadata_cache = {} + super().__init__(parameters_dictionary) - def goto_species_dir(self): - """ - Go to the species directory (starting from the main dir) - - :return: - """ - - os.chdir(self.main_dir) - species_dir = os.path.join(self.main_dir, self.genus_species) + "/" - try: - os.chdir(species_dir) - except OSError: - logging.critical("Cannot access %s" % species_dir) - sys.exit(0) - return 1 - - def set_get_history(self): + def get_history(self): """ Create or set the working history to the current species one - TODO - move to utilities? - :return: """ try: - histories = self.instance.histories.get_histories(name=str(self.full_name)) - self.history_id = histories[0]["id"] - logging.info("History for {0}: {1}".format(self.full_name, self.history_id)) + histories = self.instance.histories.get_histories(name=str(self.genus_species)) + if len(histories) == 1: + self.history_id = histories[0]["id"] + logging.debug("History ID set for {0} {1}: {2}".format(self.genus, self.species, self.history_id)) + else: + logging.critical("Multiple histories exists for {1}: {2}".format(self.genus, self.species)) except IndexError: - logging.info("Creating history for %s" % self.full_name) - self.instance.histories.create_history(name=str(self.full_name)) - histories = self.instance.histories.get_histories(name=str(self.full_name)) - self.history_id = histories[0]["id"] - logging.info("History for {0}: {1}".format(self.full_name, self.history_id)) + logging.info("Creating history for {0} {1}".format(self.genus, self.species)) + hist_dict = self.instance.histories.create_history(name=str(self.genus_species)) + self.history_id = hist_dict["id"] + logging.debug("History ID set for {0} {1}: {2}".format(self.genus, self.species, self.history_id)) return self.history_id @@ -82,26 +67,26 @@ class LoadData(speciesData.SpeciesData): Will do nothing if H. sapiens isn't in the database """ - logging.debug("Getting 'Homo sapiens' ID in instance's chado database") - get_sapiens_id_job = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0", + + logging.debug("Getting 'Homo sapiens' ID in chado database") + get_sapiens_id_job_output_dataset_id = utilities.run_tool_and_get_single_output_dataset_id( + self.instance, + tool_id=constants.GET_ORGANISMS_TOOL, # If this version if not found, Galaxy will use the one that is found history_id=self.history_id, tool_inputs={"genus": "Homo", "species": "sapiens"}) - get_sapiens_id_job_output = get_sapiens_id_job["outputs"][0]["id"] - get_sapiens_id_json_output = self.instance.datasets.download_dataset(dataset_id=get_sapiens_id_job_output) + get_sapiens_id_json_output = self.instance.datasets.download_dataset(dataset_id=get_sapiens_id_job_output_dataset_id) + + logging.info("Deleting Homo 'sapiens' in the instance's chado database") try: - logging.debug("Deleting Homo 'sapiens' in the instance's chado database") get_sapiens_id_final_output = json.loads(get_sapiens_id_json_output)[0] - sapiens_id = str( - get_sapiens_id_final_output["organism_id"]) # needs to be str to be recognized by the chado tool - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_delete_organisms/organism_delete_organisms/2.3.4+galaxy0", + sapiens_id = str(get_sapiens_id_final_output["organism_id"]) # needs to be str to be recognized by the chado tool + utilities.run_tool( + self.instance, + tool_id=constants.DELETE_ORGANISMS_TOOL, history_id=self.history_id, - tool_inputs={"organism": str(sapiens_id)}) - except bioblend.ConnectionError: - logging.debug("Homo sapiens isn't in the instance's chado database (bioblend.ConnectionError)") + tool_inputs={"organism": sapiens_id}) except IndexError: - logging.debug("Homo sapiens isn't in the instance's chado database (IndexError)") + logging.error("Homo sapiens isn't in the instance's chado database (IndexError)") pass def purge_histories(self): @@ -114,7 +99,6 @@ class LoadData(speciesData.SpeciesData): """ histories = self.instance.histories.get_histories() - self.instance.histories.get_histories(deleted=False) for h in histories: self.instance.histories.delete_history(history_id=h["id"]) @@ -128,38 +112,40 @@ class LoadData(speciesData.SpeciesData): :return: """ - self.goto_species_dir() + data_dir_root=os.path.join(self.get_species_dir(), constants.HOST_DATA_DIR) - # Delete pre-existing lib (probably created by a previous call) - gio = GalaxyInstance(url=self.instance_url, - email=self.config["galaxy_default_admin_email"], - password=self.config["galaxy_default_admin_password"]) + instance = GalaxyInstance(url=self.instance_url, + email=self.config[constants.CONF_GALAXY_DEFAULT_ADMIN_EMAIL], + password=self.config[constants.CONF_GALAXY_DEFAULT_ADMIN_PASSWORD] + ) + logging.info("Looking for project data in %s" % data_dir_root) folders = dict() post_renaming = {} - for root, dirs, files in os.walk("./src_data", followlinks=True): + for root, dirs, files in os.walk(data_dir_root, followlinks=True): file_list = [os.path.join(root, filename) for filename in files] folders[root] = file_list if folders: # Delete pre-existing lib (probably created by a previous call) - existing = gio.libraries.get_previews(name='Project Data') + existing = instance.libraries.get_previews(name='Project Data') for lib in existing: if not lib.deleted: logging.info('Pre-existing "Project Data" library %s found, removing it' % lib.id) - gio.libraries.delete(lib.id) + instance.libraries.delete(lib.id) logging.info("Creating new 'Project Data' library") - prj_lib = gio.libraries.create('Project Data', 'Data for current genome annotation project') + prj_lib = instance.libraries.create('Project Data', 'Data for current genome annotation project') self.library_id = prj_lib.id # project data folder/library logging.info("Library for {0}: {1}".format(self.full_name, self.library_id)) for fname, files in folders.items(): if fname and files: - folder_name = fname[len("./src_data") + 1:] + folder_name = re.sub(data_dir_root + "/", "", fname) logging.info("Creating folder: %s" % folder_name) folder = self.create_deep_folder(prj_lib, folder_name) + for single_file in files: ftype = 'auto' @@ -198,11 +184,16 @@ class LoadData(speciesData.SpeciesData): logging.info("Skipping useless file '%s'" % single_file) continue - logging.info("Adding file '%s' with type '%s' and name '%s'" % (single_file, ftype, clean_name)) - datasets = prj_lib.upload_from_local( - path=single_file, + single_file_relative_path = re.sub(data_dir_root, constants.CONTAINER_DATA_DIR_ROOT, single_file) + single_file_path_in_container=os.path.join(constants.CONTAINER_DATA_DIR_ROOT, single_file_relative_path) + + logging.info("Adding file '%s' with type '%s' and name '%s'" % (single_file_path_in_container, ftype, clean_name)) + datasets = prj_lib.upload_from_galaxy_fs( + single_file_path_in_container, folder=folder, - file_type=ftype + link_data_only='link_to_files', + file_type=ftype, + tag_using_filenames=False ) # Rename dataset @@ -214,10 +205,10 @@ class LoadData(speciesData.SpeciesData): time.sleep(1) - # Wait for uploads to complete - logging.info("Waiting for import jobs to finish... please wait") - - # Checking job state (only necessary if ran using SLURM) + # # Wait for uploads to complete + # logging.info("Waiting for import jobs to finish... please wait") + # + # # Checking job state (only necessary if ran using SLURM) # while True: # try: # # "C" state means the job is completed, no need to wait for it @@ -231,8 +222,8 @@ class LoadData(speciesData.SpeciesData): # break # else: # raise - - time.sleep(10) + # + # time.sleep(10) # Batch renaming --> Throws a critical error at the moment # logging.info("Import finished, now renaming datasets with pretty names") @@ -267,7 +258,29 @@ class LoadData(speciesData.SpeciesData): return new_folder - def connect_to_instance(self): + def get_bam_label(self, dirname, bam_file): + + bam_id = bam_file + if bam_id.endswith('.bam'): + bam_id = bam_id[:-4] + + if dirname in self.bam_metadata_cache: + if bam_id in self.bam_metadata_cache[dirname] and 'label' in self.bam_metadata_cache[dirname][bam_id] and self.bam_metadata_cache[dirname][bam_id]['label']: + return self.bam_metadata_cache[dirname][bam_id]['label'] + else: + return None + else: + meta_file = os.path.join(dirname, 'metadata.yml') + if os.path.exists(meta_file): + with open(meta_file) as f: + self.bam_metadata_cache[dirname] = yaml.safe_load(f) + logging.info("Found metadata in %s " % meta_file) + else: + self.bam_metadata_cache[dirname] = {} + logging.info("Did not find metadata in %s " % meta_file) + return self.get_bam_label(dirname, bam_file) + + def create_galaxy_instance(self): """ Test the connection to the galaxy instance for the current organism Exit if we cannot connect to the instance @@ -276,10 +289,9 @@ class LoadData(speciesData.SpeciesData): logging.info("Connecting to the galaxy instance (%s)" % self.instance_url) self.instance = galaxy.GalaxyInstance(url=self.instance_url, - email=self.config["galaxy_default_admin_email"], - password=self.config["galaxy_default_admin_password"] + email=self.config[constants.CONF_GALAXY_DEFAULT_ADMIN_EMAIL], + password=self.config[constants.CONF_GALAXY_DEFAULT_ADMIN_PASSWORD] ) - self.instance.histories.get_histories() try: self.instance.histories.get_histories() @@ -289,26 +301,11 @@ class LoadData(speciesData.SpeciesData): else: logging.info("Successfully connected to galaxy instance (%s) " % self.instance_url) - - - -def get_species_to_load(sp_dict_list): - """ - """ - - - - utilities.get_unique_species_list(sp_dict_list) - - - return 1 + return self.instance if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Automatic data loading in containers and interaction " - "with galaxy instances for GGA" - ", following the protocol @ " - "http://gitlab.sb-roscoff.fr/abims/e-infra/gga") + parser = argparse.ArgumentParser(description="Load data into Galaxy library") parser.add_argument("input", type=str, @@ -316,11 +313,11 @@ if __name__ == "__main__": parser.add_argument("-v", "--verbose", help="Increase output verbosity", - action="store_false") + action="store_true") parser.add_argument("--config", type=str, - help="Config path, default to the 'config' file inside the script repository") + help="Config path, default to 'examples/config.yml'") parser.add_argument("--main-directory", type=str, @@ -332,66 +329,61 @@ if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG) else: logging.basicConfig(level=logging.INFO) - logging.getLogger("urllib3").setLevel(logging.WARNING) # Parsing the config file if provided, using the default config otherwise - if not args.config: - args.config = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), "config") + if args.config: + config_file = os.path.abspath(args.config) else: - args.config = os.path.abspath(args.config) + config_file = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), constants.DEFAULT_CONFIG) + config = utilities.parse_config(config_file) + main_dir = None if not args.main_directory: - args.main_directory = os.getcwd() + main_dir = os.getcwd() else: - args.main_directory = os.path.abspath(args.main_directory) + main_dir = os.path.abspath(args.main_directory) sp_dict_list = utilities.parse_input(args.input) unique_sp_dict_list = utilities.get_unique_species_dict_list(sp_dict_list=sp_dict_list) - for sp_dict in unique_sp_dict_list: # Creating an instance of load_data_for_current_species object load_data_for_current_species = LoadData(parameters_dictionary=sp_dict) # Starting - logging.info("gga_load_data.py called for %s" % load_data_for_current_species.full_name) + logging.info("gga_load_data.py called for {0} {1}".format(load_data_for_current_species.genus, load_data_for_current_species.species)) # Setting some of the instance attributes - load_data_for_current_species.main_dir = args.main_directory + load_data_for_current_species.main_dir = main_dir load_data_for_current_species.species_dir = os.path.join(load_data_for_current_species.main_dir, load_data_for_current_species.genus_species + "/") # Parse the config yaml file - load_data_for_current_species.config = utilities.parse_config(args.config) + load_data_for_current_species.config = config # Set the instance url attribute -- Does not work with localhost on scratch (ALB) load_data_for_current_species.instance_url = "http://localhost:{0}/sp/{1}_{2}/galaxy/".format( - load_data_for_current_species.config["http_port"], + load_data_for_current_species.config[constants.CONF_ALL_HTTP_PORT], load_data_for_current_species.genus_lowercase, load_data_for_current_species.species) - - # Check the galaxy container state and proceed if the galaxy services are up and running if utilities.check_galaxy_state(genus_lowercase=load_data_for_current_species.genus_lowercase, species=load_data_for_current_species.species, script_dir=load_data_for_current_species.script_dir): - # Load config file - load_data_for_current_species.config = utilities.parse_config(args.config) - - # Testing connection to the instance - load_data_for_current_species.connect_to_instance() + # Create the Galaxy instance + load_data_for_current_species.instance = load_data_for_current_species.create_galaxy_instance() # Load the datasets into a galaxy library - logging.info("Setting up library for %s" % load_data_for_current_species.full_name) + logging.info("Setting up library for {0} {1}".format(load_data_for_current_species.genus, load_data_for_current_species.species)) load_data_for_current_species.setup_library() - logging.info("Successfully set up library in galaxy for %s" % load_data_for_current_species.full_name) + logging.debug("Successfully set up library in galaxy for {0} {1}".format(load_data_for_current_species.genus, load_data_for_current_species.species)) # Set or get the history for the current organism - load_data_for_current_species.set_get_history() + load_data_for_current_species.get_history() # Remove H. sapiens from database if here # TODO: set a dedicated history for removing H. sapiens (instead of doing it into a species history) @@ -399,11 +391,10 @@ if __name__ == "__main__": # logging.info("Importing datasets into history for %s" % load_data_for_current_species.full_name) # load_data_for_current_species.import_datasets_into_history() # Option "--load-history" - # load_data_for_current_species.purge_histories() # Testing purposes - logging.info("Data successfully loaded and imported for %s" % load_data_for_current_species.full_name) + logging.info("Data successfully loaded and imported for {0} {1}".format(load_data_for_current_species.genus, load_data_for_current_species.species)) else: - logging.critical("The galaxy container for %s is not ready yet!" % load_data_for_current_species.full_name) + logging.critical("The galaxy container for {0} {1} is not ready yet".format(load_data_for_current_species.genus, load_data_for_current_species.species)) sys.exit() diff --git a/run_workflow_phaeoexplorer.py b/run_workflow_phaeoexplorer.py index 49f3a2a9ba6ccab47bfd5e7118db987fee52d706..bff96313c37cd1f177594630f386f03fc02cce54 100755 --- a/run_workflow_phaeoexplorer.py +++ b/run_workflow_phaeoexplorer.py @@ -38,17 +38,18 @@ class RunWorkflow(speciesData.SpeciesData): """ Create or set the working history to the current species one - :return: """ try: - histories = self.instance.histories.get_histories(name=str(self.full_name)) + histories = self.instance.histories.get_histories(name=str(self.genus_species)) self.history_id = histories[0]["id"] + logging.debug("History ID set for {0}: {1}".format(self.full_name, self.history_id)) except IndexError: logging.info("Creating history for %s" % self.full_name) self.instance.histories.create_history(name=str(self.full_name)) - histories = self.instance.histories.get_histories(name=str(self.full_name)) + histories = self.instance.histories.get_histories(name=str(self.genus_species)) self.history_id = histories[0]["id"] + logging.debug("History ID set for {0}: {1}".format(self.full_name, self.history_id)) return self.history_id @@ -70,41 +71,41 @@ class RunWorkflow(speciesData.SpeciesData): logging.debug("Library ID: %s" % self.library_id) instance_source_data_folders = self.instance.libraries.get_folders(library_id=library_id) - # Access folders via their absolute path - genome_folder = self.instance.libraries.get_folders(library_id=library_id, name="/genome/" + str(self.species_folder_name) + "/v" + str(self.genome_version)) - annotation_folder = self.instance.libraries.get_folders(library_id=library_id, name="/annotation/" + str(self.species_folder_name) + "/OGS" + str(self.ogs_version)) + # # Access folders via their absolute path + # genome_folder = self.instance.libraries.get_folders(library_id=library_id, name="/genome/" + str(self.species_folder_name) + "/v" + str(self.genome_version)) + # annotation_folder = self.instance.libraries.get_folders(library_id=library_id, name="/annotation/" + str(self.species_folder_name) + "/OGS" + str(self.ogs_version)) - # Get their IDs - genome_folder_id = genome_folder[0]["id"] - annotation_folder_id = annotation_folder[0]["id"] + # # Get their IDs + # genome_folder_id = genome_folder[0]["id"] + # annotation_folder_id = annotation_folder[0]["id"] - # Get the content of the folders - genome_folder_content = self.instance.folders.show_folder(folder_id=genome_folder_id, contents=True) - annotation_folder_content = self.instance.folders.show_folder(folder_id=annotation_folder_id, contents=True) + # # Get the content of the folders + # genome_folder_content = self.instance.folders.show_folder(folder_id=genome_folder_id, contents=True) + # annotation_folder_content = self.instance.folders.show_folder(folder_id=annotation_folder_id, contents=True) - # Find genome folder datasets - genome_fasta_ldda_id = genome_folder_content["folder_contents"][0]["ldda_id"] + # # Find genome folder datasets + # genome_fasta_ldda_id = genome_folder_content["folder_contents"][0]["ldda_id"] - annotation_gff_ldda_id, annotation_proteins_ldda_id, annotation_transcripts_ldda_id = None, None, None + # annotation_gff_ldda_id, annotation_proteins_ldda_id, annotation_transcripts_ldda_id = None, None, None - # Several dicts in the annotation folder content (one dict = one file) - for k, v in annotation_folder_content.items(): - if k == "folder_contents": - for d in v: - if "proteins" in d["name"]: - annotation_proteins_ldda_id = d["ldda_id"] - if "transcripts" in d["name"]: - annotation_transcripts_ldda_id = d["ldda_id"] - if ".gff" in d["name"]: - annotation_gff_ldda_id = d["ldda_id"] + # # Several dicts in the annotation folder content (one dict = one file) + # for k, v in annotation_folder_content.items(): + # if k == "folder_contents": + # for d in v: + # if "proteins" in d["name"]: + # annotation_proteins_ldda_id = d["ldda_id"] + # if "transcripts" in d["name"]: + # annotation_transcripts_ldda_id = d["ldda_id"] + # if ".gff" in d["name"]: + # annotation_gff_ldda_id = d["ldda_id"] - # Minimum datasets to populate tripal views --> will not work if these files are not assigned in the input file - self.datasets["genome_file"] = genome_fasta_ldda_id - self.datasets["gff_file"] = annotation_gff_ldda_id - self.datasets["proteins_file"] = annotation_proteins_ldda_id - self.datasets["transcripts_file"] = annotation_transcripts_ldda_id + # # Minimum datasets to populate tripal views --> will not work if these files are not assigned in the input file + # self.datasets["genome_file"] = genome_fasta_ldda_id + # self.datasets["gff_file"] = annotation_gff_ldda_id + # self.datasets["proteins_file"] = annotation_proteins_ldda_id + # self.datasets["transcripts_file"] = annotation_transcripts_ldda_id - return {"history_id": self.history_id, "library_id": library_id, "datasets": self.datasets} + return {"history_id": self.history_id, "library_id": library_id} def connect_to_instance(self): @@ -114,7 +115,7 @@ class RunWorkflow(speciesData.SpeciesData): """ - logging.info("Connecting to the galaxy instance (%s)" % self.instance_url) + # logging.debug("Connecting to the galaxy instance (%s)" % self.instance_url) self.instance = galaxy.GalaxyInstance(url=self.instance_url, email=self.config["galaxy_default_admin_email"], password=self.config["galaxy_default_admin_password"] @@ -124,56 +125,20 @@ class RunWorkflow(speciesData.SpeciesData): try: self.instance.histories.get_histories() except bioblend.ConnectionError: - logging.critical("Cannot connect to galaxy instance (%s) " % self.instance_url) + logging.critical("Cannot connect to galaxy instance (%s)" % self.instance_url) sys.exit() else: - logging.info("Successfully connected to galaxy instance (%s) " % self.instance_url) + # logging.debug("Successfully connected to galaxy instance (%s) " % self.instance_url) return 1 - def install_changesets_revisions_from_workflow(self, workflow_path): - """ - Read a .ga file to extract the information about the different tools called. - Check if every tool is installed via a "show_tool". - If a tool is not installed (versions don't match), send a warning to the logger and install the required changeset (matching the tool version) - Doesn't do anything if versions match - - :return: - """ - - logging.info("Validating that installed tools versions and changesets match workflow versions") - - # Load the workflow file (.ga) in a buffer - with open(workflow_path, 'r') as ga_in_file: - - # Then store the decoded json dictionary - workflow_dict = json.load(ga_in_file) - - # Look up every "step_id" looking for tools - for k, v in workflow_dict["steps"].items(): - if v["tool_id"]: - # Get the descriptive dictionary of the installed tool (using the tool id in the workflow) - show_tool = self.instance.tools.show_tool(v["tool_id"]) + def return_instance(self): - # Check if an installed version matches the workflow tool version - # (If it's not installed, the show_tool version returned will be a default version with the suffix "XXXX+0") - if show_tool["version"] != v["tool_version"]: - # If it doesn't match, proceed to install of the correct changeset revision - print(show_tool) - # logging.warning("Tool versions don't match for {0} (changeset installed: {1} | changeset required: {2}). Installing changeset revision {3}...".format(v["tool_shed_repository"]["name"], show_tool["changeset_revision"], v["tool_shed_repository"]["changeset_revision"], v["tool_shed_repository"]["changeset_revision"])) - toolshed = "https://" + v["tool_shed_repository"]["tool_shed"] - name = v["tool_shed_repository"]["name"] - owner = v["tool_shed_repository"]["owner"] - changeset_revision = v["tool_shed_repository"]["changeset_revision"] - self.instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, - changeset_revision=changeset_revision, - install_tool_dependencies=True, - install_repository_dependencies=False, - install_resolver_dependencies=True) + return self.instance + - logging.info("Tools versions and changesets from workflow validated") def install_changesets_revisions_for_individual_tools(self): """ @@ -189,21 +154,22 @@ class RunWorkflow(speciesData.SpeciesData): logging.info("Validating installed individual tools versions and changesets") # Verify that the add_organism and add_analysis versions are correct in the toolshed - add_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/2.3.3") - add_analysis_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/2.3.3") - get_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.3") - get_analysis_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.3") + add_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/2.3.4+galaxy0") + add_analysis_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/2.3.4+galaxy0") + get_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0") + get_analysis_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.4+galaxy0") - # changeset for 2.3.3 has to be manually found because there is no way to get the wanted changeset of a non installed tool via bioblend + # changeset for 2.3.4+galaxy0 has to be manually found because there is no way to get the wanted changeset of a non installed tool via bioblend # except for workflows (.ga) that already contain the changeset revisions inside the steps ids - if get_organism_tool["version"] != "2.3.3": - logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) - changeset_revision = "b07279b5f3bf" + if get_organism_tool["version"] != "2.3.4+galaxy0": toolshed_dict = get_organism_tool["tool_shed_repository"] + logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) + changeset_revision = "831229e6cda2" name = toolshed_dict["name"] owner = toolshed_dict["owner"] toolshed = "https://" + toolshed_dict["tool_shed"] + logging.warning("Installing changeset revision {0} for {1}".format(changeset_revision, name)) self.instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, changeset_revision=changeset_revision, @@ -211,13 +177,14 @@ class RunWorkflow(speciesData.SpeciesData): install_repository_dependencies=False, install_resolver_dependencies=True) - if get_analysis_tool["version"] != "2.3.3": - logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) - changeset_revision = "c7be2feafd73" + if get_analysis_tool["version"] != "2.3.4+galaxy0": toolshed_dict = changeset_revision["tool_shed_repository"] + logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) + changeset_revision = "a867923f555e" name = toolshed_dict["name"] owner = toolshed_dict["owner"] toolshed = "https://" + toolshed_dict["tool_shed"] + logging.warning("Installing changeset revision {0} for {1}".format(changeset_revision, name)) self.instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, changeset_revision=changeset_revision, @@ -225,13 +192,14 @@ class RunWorkflow(speciesData.SpeciesData): install_repository_dependencies=False, install_resolver_dependencies=True) - if add_organism_tool["version"] != "2.3.3": - logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) - changeset_revision = "680a1fe3c266" + if add_organism_tool["version"] != "2.3.4+galaxy0": toolshed_dict = add_organism_tool["tool_shed_repository"] + logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) + changeset_revision = "1f12b9650028" name = toolshed_dict["name"] owner = toolshed_dict["owner"] toolshed = "https://" + toolshed_dict["tool_shed"] + logging.warning("Installing changeset revision {0} for {1}".format(changeset_revision, name)) self.instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, changeset_revision=changeset_revision, @@ -239,14 +207,15 @@ class RunWorkflow(speciesData.SpeciesData): install_repository_dependencies=False, install_resolver_dependencies=True) - if add_analysis_tool["version"] != "2.3.3": - logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) - changeset_revision = "43c36801669f" + if add_analysis_tool["version"] != "2.3.4+galaxy0": toolshed_dict = add_analysis_tool["tool_shed_repository"] + logging.warning("Changeset for %s is not installed" % toolshed_dict["name"]) + changeset_revision = "10b2b1c70e69" name = toolshed_dict["name"] owner = toolshed_dict["owner"] toolshed = "https://" + toolshed_dict["tool_shed"] - logging.warning("Installing changeset revision %s for add_analysis" % changeset_revision) + logging.warning("Installing changeset revision {0} for {1}".format(changeset_revision, name)) + self.instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, changeset_revision=changeset_revision, install_tool_dependencies=True, @@ -281,121 +250,213 @@ class RunWorkflow(speciesData.SpeciesData): self.connect_to_instance() self.set_get_history() - # We want the tools version default to be 2.3.3 at the moment - tool_version = "2.3.3" - # Add organism (species) to chado - logging.info("Adding organism to the instance's chado database") - if self.common == "" or self.common is None: - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.abbreviation}) - else: - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.common}) + tool_version = "2.3.4+galaxy0" - # Add OGS analysis to chado - logging.info("Adding OGS analysis to the instance's chado database") - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/%s" % tool_version, + get_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0") + + get_organisms = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/%s" % tool_version, history_id=self.history_id, - tool_inputs={"name": self.full_name_lowercase + " OGS" + self.ogs_version, - "program": "Performed by Genoscope", - "programversion": str(self.sex + " OGS" + self.ogs_version), - "sourcename": "Genoscope", - "date_executed": self.date}) + tool_inputs={}) - # Add genome analysis to chado - logging.info("Adding genome analysis to the instance's chado database") - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/%s" % tool_version, + time.sleep(10) # Ensure the tool has had time to complete + org_outputs = get_organisms["outputs"] # Outputs from the get_organism tool + org_job_out_id = org_outputs[0]["id"] # ID of the get_organism output dataset (list of dicts) + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) # Download the dataset + org_output = json.loads(org_json_output) # Turn the dataset into a list for parsing + + org_id = None + + # Look up list of outputs (dictionaries) + for organism_output_dict in org_output: + if organism_output_dict["genus"] == self.genus and organism_output_dict["species"] == "{0} {1}".format(self.species, self.sex): + correct_organism_id = str(organism_output_dict["organism_id"]) # id needs to be a str to be recognized by chado tools + org_id = str(correct_organism_id) + + + if org_id is None: + if self.common == "" or self.common is None: + add_org_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"abbr": self.abbreviation, + "genus": self.genus_uppercase, + "species": self.chado_species_name, + "common": self.abbreviation}) + org_job_out_id = add_org_job["outputs"][0]["id"] + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) + org_output = json.loads(org_json_output) + org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools + else: + add_org_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"abbr": self.abbreviation, + "genus": self.genus_uppercase, + "species": self.chado_species_name, + "common": self.common}) + org_job_out_id = add_org_job["outputs"][0]["id"] + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) + org_output = json.loads(org_json_output) + org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools + + + get_analyses = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/%s" % tool_version, history_id=self.history_id, - tool_inputs={"name": self.full_name_lowercase + " genome v" + self.genome_version, - "program": "Performed by Genoscope", - "programversion": str(self.sex + "genome v" + self.genome_version), - "sourcename": "Genoscope", - "date_executed": self.date}) + tool_inputs={}) + + time.sleep(10) + analysis_outputs = get_analyses["outputs"] + analysis_job_out_id = analysis_outputs[0]["id"] + analysis_json_output = self.instance.datasets.download_dataset(dataset_id=analysis_job_out_id) + analysis_output = json.loads(analysis_json_output) - # # TODO: check output of get_organism --> if empty or wrong --> rerun --> else: go next - # # Get organism and analyses IDs (runtime inputs for workflow) - # time.sleep(3) - # # Get the ID for the current organism in chado - # org = self.instance.tools.run_tool( - # tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/%s" % tool_version, - # history_id=self.history_id, - # tool_inputs={"abbr": self.abbreviation, - # "genus": self.genus_uppercase, - # "species": self.chado_species_name, - # "common": self.common}) - - # time.sleep(3) - # # Run tool again (sometimes the tool doesn't return anything despite the organism already being in the db) - # org = self.instance.tools.run_tool( - # tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/%s" % tool_version, - # history_id=self.history_id, - # tool_inputs={"abbr": self.abbreviation, - # "genus": self.genus_uppercase, - # "species": self.chado_species_name, - # "common": self.common}) - - # org_job_out = org["outputs"][0]["id"] - # org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out) - # try: - # org_output = json.loads(org_json_output)[0] - # self.org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools - # except IndexError: - # logging.critical("No organism matching " + self.full_name + " exists in the instance's chado database") - # sys.exit() - - - def get_genome_analysis_id(self): + ogs_analysis_id = None + genome_analysis_id = None + + # Look up list of outputs (dictionaries) + for analysis_output_dict in analysis_output: + if analysis_output_dict["name"] == self.full_name_lowercase + " OGS" + self.ogs_version: + ogs_analysis_id = str(analysis_output_dict["analysis_id"]) + if analysis_output_dict["name"] == self.full_name_lowercase + " genome v" + self.genome_version: + genome_analysis_id = str(analysis_output_dict["analysis_id"]) + + + if ogs_analysis_id is None: + add_ogs_analysis_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"name": self.full_name_lowercase + " OGS" + self.ogs_version, + "program": "Performed by Genoscope", + "programversion": str(self.sex + " OGS" + self.ogs_version), + "sourcename": "Genoscope", + "date_executed": self.date}) + analysis_outputs = add_ogs_analysis_job["outputs"] + analysis_job_out_id = analysis_outputs[0]["id"] + analysis_json_output = self.instance.datasets.download_dataset(dataset_id=analysis_job_out_id) + analysis_output = json.loads(analysis_json_output) + ogs_analysis_id = str(analysis_output["analysis_id"]) + + if genome_analysis_id is None: + add_genome_analysis_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"name": self.full_name_lowercase + " genome v" + self.genome_version, + "program": "Performed by Genoscope", + "programversion": str(self.sex + "genome v" + self.genome_version), + "sourcename": "Genoscope", + "date_executed": self.date}) + analysis_outputs = add_genome_analysis_job["outputs"] + analysis_job_out_id = analysis_outputs[0]["id"] + analysis_json_output = self.instance.datasets.download_dataset(dataset_id=analysis_job_out_id) + analysis_output = json.loads(analysis_json_output) + genome_analysis_id = str(analysis_output["analysis_id"]) + + # print({"org_id": org_id, "genome_analysis_id": genome_analysis_id, "ogs_analysis_id": ogs_analysis_id}) + return({"org_id": org_id, "genome_analysis_id": genome_analysis_id, "ogs_analysis_id": ogs_analysis_id}) + + + def add_organism_blastp_analysis(self): """ + Add OGS and genome vX analyses to Chado database + Required for Chado Load Tripal Synchronize workflow (which should be ran as the first workflow) + Called outside workflow for practical reasons (Chado add doesn't have an input link for analysis or organism) + + :return: + """ - # Get the ID for the genome analysis in chado - genome_analysis = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.3", + self.connect_to_instance() + self.set_get_history() + + tool_version = "2.3.4+galaxy0" + + get_organism_tool = self.instance.tools.show_tool("toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0") + + get_organisms = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/%s" % tool_version, history_id=self.history_id, - tool_inputs={"name": self.full_name_lowercase + " genome v" + self.genome_version}) - genome_analysis_job_out = genome_analysis["outputs"][0]["id"] - genome_analysis_json_output = self.instance.datasets.download_dataset(dataset_id=genome_analysis_job_out) - try: - genome_analysis_output = json.loads(genome_analysis_json_output)[0] - self.genome_analysis_id = str(genome_analysis_output["analysis_id"]) - except IndexError as exc: - logging.critical("no matching genome analysis exists in the instance's chado database") - sys.exit(exc) + tool_inputs={}) - return self.genome_analysis_id + time.sleep(10) # Ensure the tool has had time to complete + org_outputs = get_organisms["outputs"] # Outputs from the get_organism tool + org_job_out_id = org_outputs[0]["id"] # ID of the get_organism output dataset (list of dicts) + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) # Download the dataset + org_output = json.loads(org_json_output) # Turn the dataset into a list for parsing - def get_ogs_analysis_id(self): - """ - """ + org_id = None - # Get the ID for the OGS analysis in chado - ogs_analysis = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.3", + # Look up list of outputs (dictionaries) + for organism_output_dict in org_output: + if organism_output_dict["genus"] == self.genus and organism_output_dict["species"] == "{0} {1}".format(self.species, self.sex): + correct_organism_id = str(organism_output_dict["organism_id"]) # id needs to be a str to be recognized by chado tools + org_id = str(correct_organism_id) + + + if org_id is None: + if self.common == "" or self.common is None: + add_org_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"abbr": self.abbreviation, + "genus": self.genus_uppercase, + "species": self.chado_species_name, + "common": self.abbreviation}) + org_job_out_id = add_org_job["outputs"][0]["id"] + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) + org_output = json.loads(org_json_output) + org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools + else: + add_org_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"abbr": self.abbreviation, + "genus": self.genus_uppercase, + "species": self.chado_species_name, + "common": self.common}) + org_job_out_id = add_org_job["outputs"][0]["id"] + org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out_id) + org_output = json.loads(org_json_output) + org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools + + + get_analyses = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/%s" % tool_version, history_id=self.history_id, - tool_inputs={"name": self.full_name_lowercase + " OGS" + self.ogs_version}) - ogs_analysis_job_out = ogs_analysis["outputs"][0]["id"] - ogs_analysis_json_output = self.instance.datasets.download_dataset(dataset_id=ogs_analysis_job_out) - try: - ogs_analysis_output = json.loads(ogs_analysis_json_output)[0] - self.ogs_analysis_id = str(ogs_analysis_output["analysis_id"]) - except IndexError as exc: - logging.critical("No matching OGS analysis exists in the instance's chado database") - sys.exit(exc) + tool_inputs={}) - return self.ogs_analysis_id + time.sleep(10) + analysis_outputs = get_analyses["outputs"] + analysis_job_out_id = analysis_outputs[0]["id"] + analysis_json_output = self.instance.datasets.download_dataset(dataset_id=analysis_job_out_id) + analysis_output = json.loads(analysis_json_output) + blastp_analysis_id = None + + # Look up list of outputs (dictionaries) + for analysis_output_dict in analysis_output: + if analysis_output_dict["name"] == "Diamond on " + self.full_name_lowercase + " OGS" + self.ogs_version: + blastp_analysis_id = str(analysis_output_dict["analysis_id"]) + + + if blastp_analysis_id is None: + add_blast_analysis_job = self.instance.tools.run_tool( + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/%s" % tool_version, + history_id=self.history_id, + tool_inputs={"name": "Diamond on " + self.full_name_lowercase + " OGS" + self.ogs_version, + "program": "Performed by Genoscope", + "programversion": str(self.sex + " OGS" + self.ogs_version), + "sourcename": "Genoscope", + "date_executed": self.date}) + analysis_outputs = add_blast_analysis_job["outputs"] + analysis_job_out_id = analysis_outputs[0]["id"] + analysis_json_output = self.instance.datasets.download_dataset(dataset_id=analysis_job_out_id) + analysis_output = json.loads(analysis_json_output) + blastp_analysis_id = str(analysis_output["analysis_id"]) + + # print({"org_id": org_id, "genome_analysis_id": genome_analysis_id, "ogs_analysis_id": ogs_analysis_id}) + return({"org_id": org_id, "blastp_analysis_id": blastp_analysis_id}) def add_interproscan_analysis(self): """ @@ -404,7 +465,7 @@ class RunWorkflow(speciesData.SpeciesData): # Add Interpro analysis to chado logging.info("Adding Interproscan analysis to the instance's chado database") self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/2.3.3", + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/2.3.4+galaxy0", history_id=self.history_id, tool_inputs={"name": "InterproScan on OGS%s" % self.ogs_version, "program": "InterproScan", @@ -419,7 +480,7 @@ class RunWorkflow(speciesData.SpeciesData): # Get interpro ID interpro_analysis = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.3", + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.4+galaxy0", history_id=self.history_id, tool_inputs={"name": "InterproScan on OGS%s" % self.ogs_version}) interpro_analysis_job_out = interpro_analysis["outputs"][0]["id"] @@ -433,43 +494,6 @@ class RunWorkflow(speciesData.SpeciesData): return self.interpro_analysis_id - def add_blastp_diamond_analysis(self): - """ - - """ - # Add Blastp (diamond) analysis to chado - logging.info("Adding Blastp Diamond analysis to the instance's chado database") - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_add_analysis/analysis_add_analysis/2.3.3", - history_id=self.history_id, - tool_inputs={"name": "Diamond on OGS%s" % self.ogs_version, - "program": "Diamond", - "programversion": "OGS%s" % self.ogs_version, - "sourcename": "Genoscope", - "date_executed": self.date}) - - - def get_blastp_diamond_analysis_id(self): - """ - """ - - # Get blasp ID - blast_diamond_analysis = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_analysis_get_analyses/analysis_get_analyses/2.3.3", - history_id=self.history_id, - tool_inputs={"name": "Diamond on OGS%s" % self.ogs_version}) - blast_diamond_analysis_job_out = blast_diamond_analysis["outputs"][0]["id"] - blast_diamond_analysis_json_output = self.instance.datasets.download_dataset(dataset_id=blast_diamond_analysis_job_out) - try: - blast_diamond_analysis_output = json.loads(blast_diamond_analysis_json_output)[0] - self.blast_diamond_analysis_id = str(blast_diamond_analysis_output["analysis_id"]) - except IndexError as exc: - logging.critical("No matching InterproScan analysis exists in the instance's chado database") - sys.exit(exc) - - return self.blast_diamond_analysis_id - - def run_workflow(self, workflow_path, workflow_parameters, workflow_name, datamap): """ Run a workflow in galaxy @@ -492,10 +516,10 @@ class RunWorkflow(speciesData.SpeciesData): # In case of the Jbrowse workflow, we unfortunately have to manually edit the parameters instead of setting them # as runtime values, using runtime parameters makes the tool throw an internal critical error ("replace not found" error) # Scratchgmod test: need "http" (or "https"), the hostname (+ port) - if "menu_url" not in self.config.keys(): + if "jbrowse_menu_url" not in self.config.keys(): jbrowse_menu_url = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=self.config["hostname"], genus_sp=self.genus_species, Genus=self.genus_uppercase, species=self.species, id="{id}") else: - jbrowse_menu_url = self.config["menu_url"] + jbrowse_menu_url = self.config["jbrowse_menu_url"] if workflow_name == "Jbrowse": workflow_dict["steps"]["2"]["tool_state"] = workflow_dict["steps"]["2"]["tool_state"].replace("__MENU_URL__", jbrowse_menu_url) # The UNIQUE_ID is specific to a combination genus_species_strain_sex so every combination should have its unique workflow @@ -546,16 +570,6 @@ class RunWorkflow(speciesData.SpeciesData): return invocation_report - - - def get_datasets_ldda_ids(self): - """ - Get and return the ldda_ids (and names) for the datasets in the library - """ - - return 0 - - def import_datasets_into_history(self): """ Find datasets in a library, get their ID and import them into the current history if they are not already @@ -580,19 +594,16 @@ class RunWorkflow(speciesData.SpeciesData): for i in instance_source_data_folders: folders_ids[i["name"]] = i["id"] - # Iterating over the folders to find datasets and map datasets to their IDs - logging.debug("Datasets IDs: ") for k, v in folders_ids.items(): if k == "/genome/{0}/v{1}".format(self.species_folder_name, self.genome_version): sub_folder_content = self.instance.folders.show_folder(folder_id=v, contents=True) for k2, v2 in sub_folder_content.items(): for e in v2: if type(e) == dict: - if e["name"].endswith(".fa"): + if e["name"].endswith(".fasta"): self.datasets["genome_file"] = e["ldda_id"] self.datasets_name["genome_file"] = e["name"] - logging.debug("\tGenome file:\t" + e["name"] + ": " + e["ldda_id"]) if k == "/annotation/{0}/OGS{1}".format(self.species_folder_name, self.ogs_version): sub_folder_content = self.instance.folders.show_folder(folder_id=v, contents=True) @@ -602,54 +613,82 @@ class RunWorkflow(speciesData.SpeciesData): if "transcripts" in e["name"]: self.datasets["transcripts_file"] = e["ldda_id"] self.datasets_name["transcripts_file"] = e["name"] - logging.debug("\tTranscripts file:\t" + e["name"] + ": " + e["ldda_id"]) elif "proteins" in e["name"]: self.datasets["proteins_file"] = e["ldda_id"] self.datasets_name["proteins_file"] = e["name"] - logging.debug("\tProteins file:\t" + e["name"] + ": " + e["ldda_id"]) elif "gff" in e["name"]: self.datasets["gff_file"] = e["ldda_id"] self.datasets_name["gff_file"] = e["name"] - logging.debug("\tGFF file:\t" + e["name"] + ": " + e["ldda_id"]) elif "interpro" in e["name"]: self.datasets["interproscan_file"] = e["ldda_id"] self.datasets_name["interproscan_file"] = e["name"] - logging.debug("\tInterproscan file:\t" + e["name"] + ": " + e["ldda_id"]) elif "blastp" in e["name"]: - self.datasets["blast_diamond_file"] = e["ldda_id"] - self.datasets_name["blast_diamond_file"] = e["name"] - logging.debug("\tBlastp diamond file:\t" + e["name"] + ": " + e["ldda_id"]) + self.datasets["blastp_file"] = e["ldda_id"] + self.datasets_name["blastp_file"] = e["name"] - logging.debug("Uploading datasets into history %s" % self.history_id) + + history_datasets_li = self.instance.datasets.get_datasets() + genome_hda_id, gff_hda_id, transcripts_hda_id, proteins_hda_id, blastp_hda_id, interproscan_hda_id = None, None, None, None, None, None + + # Finding datasets in history (matching datasets names) + for dataset in history_datasets_li: + dataset_name = dataset["name"] + dataset_id = dataset["id"] + if dataset_name == "{0}_v{1}.fasta".format(self.dataset_prefix, self.genome_version): + genome_hda_id = dataset_id + if dataset_name == "{0}_OGS{1}_{2}.gff".format(self.dataset_prefix, self.ogs_version, self.date): + gff_hda_id = dataset_id + if dataset_name == "{0}_OGS{1}_transcripts.fasta".format(self.dataset_prefix, self.ogs_version): + transcripts_hda_id = dataset_id + if dataset_name == "{0}_OGS{1}_proteins.fasta".format(self.dataset_prefix, self.ogs_version): + proteins_hda_id = dataset_id + if dataset_name == "{0}_OGS{1}_blastp.xml".format(self.dataset_prefix, self.ogs_version): + blastp_hda_id = dataset_id + + # Import each dataset into history if it is not imported + logging.debug("Uploading datasets into history %s" % self.history_id) - first_hda_ids = self.get_datasets_hda_ids() - - if first_hda_ids["genome_hda_id"] is None: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["genome_file"]) - if first_hda_ids["gff_hda_id"] is None: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["gff_file"]) - if first_hda_ids["transcripts_hda_id"] is None: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["transcripts_file"]) - if first_hda_ids["proteins_hda_id"] is None: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["proteins_file"]) - if first_hda_ids["interproscan_hda_id"] is None: + if genome_hda_id is None: + genome_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["genome_file"]) + genome_hda_id = genome_dataset_upload["id"] + if gff_hda_id is None: + gff_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["gff_file"]) + gff_hda_id = gff_dataset_upload["id"] + if transcripts_hda_id is None: + transcripts_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["transcripts_file"]) + transcripts_hda_id = transcripts_dataset_upload["id"] + if proteins_hda_id is None: + proteins_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["proteins_file"]) + proteins_hda_id = proteins_dataset_upload["id"] + if interproscan_hda_id is None: try: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["interproscan_file"]) + interproscan_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["interproscan_file"]) + interproscan_hda_id = interproscan_dataset_upload["id"] except Exception as exc: logging.debug("Interproscan file not found in library (history: {0})".format(self.history_id)) - if first_hda_ids["blast_diamond_hda_id"] is None: + if blastp_hda_id is None: try: - self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["blast_diamond_file"]) + blastp_dataset_upload = self.instance.histories.upload_dataset_from_library(history_id=self.history_id, lib_dataset_id=self.datasets["blastp_file"]) + blastp_hda_id = blastp_dataset_upload["id"] except Exception as exc: - logging.debug("Blastp file not found in library (history: {0})".format(self.history_id)) + logging.debug("blastp file not found in library (history: {0})".format(self.history_id)) - # _datasets = self.instance.datasets.get_datasets() - # with open(os.path.join(self.main_dir, "datasets_ids.json"), "w") as datasets_ids_outfile: - # datasets_ids_outfile.write(str(_datasets)) + # logging.debug("History dataset IDs (hda_id) for %s:" % self.full_name) + # logging.debug({"genome_hda_id": genome_hda_id, + # "gff_hda_id": gff_hda_id, + # "transcripts_hda_id": transcripts_hda_id, + # "proteins_hda_id": proteins_hda_id, + # "blastp_hda_id": blastp_hda_id, + # "interproscan_hda_id": interproscan_hda_id}) # Return a dict made of the hda ids - return self.get_datasets_hda_ids() + return {"genome_hda_id": genome_hda_id, + "gff_hda_id": gff_hda_id, + "transcripts_hda_id": transcripts_hda_id, + "proteins_hda_id": proteins_hda_id, + "blastp_hda_id": blastp_hda_id, + "interproscan_hda_id": interproscan_hda_id} def get_datasets_hda_ids(self): @@ -675,114 +714,275 @@ class RunWorkflow(speciesData.SpeciesData): # Match files imported in history names vs library datasets names to assign their respective hda_id for dataset_dict in history_datasets_li: if dataset_dict["history_id"] == self.history_id: - if dataset_dict["name"] == self.datasets_name["genome_file"]: + if dataset_dict["name"] == self.datasets_name["genome_file"] and dataset_dict["id"] not in imported_datasets_ids: genome_dataset_hda_id = dataset_dict["id"] - logging.debug("Genome dataset hda id: %s" % genome_dataset_hda_id) - elif dataset_dict["name"] == self.datasets_name["proteins_file"]: + elif dataset_dict["name"] == self.datasets_name["proteins_file"] and dataset_dict["id"] not in imported_datasets_ids: proteins_datasets_hda_id = dataset_dict["id"] - logging.debug("Proteins dataset hda ID: %s" % proteins_datasets_hda_id) - elif dataset_dict["name"] == self.datasets_name["transcripts_file"]: + elif dataset_dict["name"] == self.datasets_name["transcripts_file"] and dataset_dict["id"] not in imported_datasets_ids: transcripts_dataset_hda_id = dataset_dict["id"] - logging.debug("Transcripts dataset hda ID: %s" % transcripts_dataset_hda_id) - elif dataset_dict["name"] == self.datasets_name["gff_file"]: + elif dataset_dict["name"] == self.datasets_name["gff_file"] and dataset_dict["id"] not in imported_datasets_ids: gff_dataset_hda_id = dataset_dict["id"] - logging.debug("GFF dataset hda ID: %s" % gff_dataset_hda_id) - if "interproscan_file" in self.datasets_name.keys(): - if dataset_dict["name"] == self.datasets_name["interproscan_file"]: + if dataset_dict["name"] == self.datasets_name["interproscan_file"] and dataset_dict["id"] not in imported_datasets_ids: interproscan_dataset_hda_id = dataset_dict["id"] - logging.debug("InterproScan dataset hda ID: %s" % gff_dataset_hda_id) if "blast_diamond_file" in self.datasets_name.keys(): - if dataset_dict["name"] == self.datasets_name["blast_diamond_file"]: - blast_diamond_dataset_hda_id = dataset_dict["id"] - logging.debug("Blast Diamond dataset hda ID: %s" % gff_dataset_hda_id) + if dataset_dict["name"] == self.datasets_name["blastp_file"] and dataset_dict["id"] not in imported_datasets_ids: + blastp_dataset_hda_id = dataset_dict["id"] + + logging.debug("Genome dataset hda id: %s" % genome_dataset_hda_id) + logging.debug("Proteins dataset hda ID: %s" % proteins_datasets_hda_id) + logging.debug("Transcripts dataset hda ID: %s" % transcripts_dataset_hda_id) + logging.debug("GFF dataset hda ID: %s" % gff_dataset_hda_id) + logging.debug("InterproScan dataset hda ID: %s" % gff_dataset_hda_id) + logging.debug("Blastp Diamond dataset hda ID: %s" % blastp_dataset_hda_id) + + # Add datasets IDs to already imported IDs (so we don't assign all the wrong IDs to the next organism if there is one) + imported_datasets_ids.append(genome_dataset_hda_id) + imported_datasets_ids.append(transcripts_dataset_hda_id) + imported_datasets_ids.append(proteins_datasets_hda_id) + imported_datasets_ids.append(gff_dataset_hda_id) + imported_datasets_ids.append(interproscan_dataset_hda_id) + imported_datasets_ids.append(blastp_dataset_hda_id) # Return a dict made of the hda ids return {"genome_hda_id": genome_dataset_hda_id, "transcripts_hda_id": transcripts_dataset_hda_id, "proteins_hda_id": proteins_datasets_hda_id, "gff_hda_id": gff_dataset_hda_id, "interproscan_hda_id": interproscan_dataset_hda_id, - "blast_diamond_hda_id": blast_diamond_dataset_hda_id} + "blastp_hda_id": blastp_dataset_hda_id, + "imported_datasets_ids": imported_datasets_ids} - def get_organism_id(self): - """ - Retrieve current organism ID - Will try to add it to Chado if the organism ID can't be found - :return: - """ +def run_workflow(workflow_path, workflow_parameters, datamap, config, input_species_number): + """ + Run a workflow in galaxy + Requires the .ga file to be loaded as a dictionary (optionally could be uploaded as a raw file) - tool_version = "2.3.3" - time.sleep(3) + :param workflow_name: + :param workflow_parameters: + :param datamap: + :return: + """ - # # Get the ID for the current organism in chado - # org = self.instance.tools.run_tool( - # tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.3", - # history_id=self.history_id, - # tool_inputs={"abbr": self.abbreviation, - # "genus": self.genus_uppercase, - # "species": self.chado_species_name, - # "common": self.common}) + logging.info("Importing workflow %s" % str(workflow_path)) - # time.sleep(3) + # Load the workflow file (.ga) in a buffer + with open(workflow_path, 'r') as ga_in_file: - # Run tool again (sometimes the tool doesn't return anything despite the organism already being in the db) - org = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.3", - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.common}) + # Then store the decoded json dictionary + workflow_dict = json.load(ga_in_file) - org_job_out = org["outputs"][0]["id"] - org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out) + # In case of the Jbrowse workflow, we unfortunately have to manually edit the parameters instead of setting them + # as runtime values, using runtime parameters makes the tool throw an internal critical error ("replace not found" error) + # Scratchgmod test: need "http" (or "https"), the hostname (+ port) + if "jbrowse_menu_url" not in config.keys(): + jbrowse_menu_url = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=self.config["hostname"], genus_sp=self.genus_species, Genus=self.genus_uppercase, species=self.species, id="{id}") + else: + jbrowse_menu_url = config["jbrowse_menu_url"] + if workflow_name == "Jbrowse": + workflow_dict["steps"]["2"]["tool_state"] = workflow_dict["steps"]["2"]["tool_state"].replace("__MENU_URL__", jbrowse_menu_url) + # The UNIQUE_ID is specific to a combination genus_species_strain_sex so every combination should have its unique workflow + # in galaxy --> define a naming method for these workflows + workflow_dict["steps"]["3"]["tool_state"] = workflow_dict["steps"]["3"]["tool_state"].replace("__FULL_NAME__", self.full_name).replace("__UNIQUE_ID__", self.species_folder_name) + + # Import the workflow in galaxy as a dict + self.instance.workflows.import_workflow_dict(workflow_dict=workflow_dict) + + # Get its attributes + workflow_attributes = self.instance.workflows.get_workflows(name=workflow_name) + # Then get its ID (required to invoke the workflow) + workflow_id = workflow_attributes[0]["id"] # Index 0 is the most recently imported workflow (the one we want) + show_workflow = self.instance.workflows.show_workflow(workflow_id=workflow_id) + # Check if the workflow is found try: - org_output = json.loads(org_json_output)[0] - self.org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools - except IndexError: - logging.warning("No organism matching " + self.full_name + " exists in the instance's chado database, adding it") - if self.common == "" or self.common is None: - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.abbreviation}) - else: - self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/%s" % tool_version, - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.common}) - # Run tool again (sometimes the tool doesn't return anything despite the organism already being in the db) - org = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.3", - history_id=self.history_id, - tool_inputs={"abbr": self.abbreviation, - "genus": self.genus_uppercase, - "species": self.chado_species_name, - "common": self.common}) + logging.debug("Workflow ID: %s" % workflow_id) + except bioblend.ConnectionError: + logging.warning("Error retrieving workflow attributes for workflow %s" % workflow_name) - org_job_out = org["outputs"][0]["id"] - org_json_output = self.instance.datasets.download_dataset(dataset_id=org_job_out) - try: - org_output = json.loads(org_json_output)[0] - self.org_id = str(org_output["organism_id"]) # id needs to be a str to be recognized by chado tools - except IndexError: - logging.critical("Cannot add {0} as an organism in Chado, please check the galaxy instance {1}".format(self.full_name, self.instance_url)) - sys.exit() + # Finally, invoke the workflow alogn with its datamap, parameters and the history in which to invoke it + self.instance.workflows.invoke_workflow(workflow_id=workflow_id, + history_id=self.history_id, + params=workflow_parameters, + inputs=datamap, + allow_tool_state_corrections=True) + + logging.info("Successfully imported and invoked workflow {0}, check the galaxy instance ({1}) for the jobs state".format(workflow_name, self.instance_url)) - return self.org_id + + + +def create_sp_workflow_dict(sp_dict, main_dir, config, workflow_type): + """ + """ + + sp_workflow_dict = {} + run_workflow_for_current_organism = RunWorkflow(parameters_dictionary=sp_dict) + + # Verifying the galaxy container is running + if utilities.check_galaxy_state(genus_lowercase=run_workflow_for_current_organism.genus_lowercase, + species=run_workflow_for_current_organism.species, + script_dir=run_workflow_for_current_organism.script_dir): + + # Starting + logging.info("run_workflow.py called for %s" % run_workflow_for_current_organism.full_name) + + # Setting some of the instance attributes + run_workflow_for_current_organism.main_dir = main_dir + run_workflow_for_current_organism.species_dir = os.path.join(run_workflow_for_current_organism.main_dir, + run_workflow_for_current_organism.genus_species + + "/") + + # Parse the config yaml file + run_workflow_for_current_organism.config = config + # Set the instance url attribute --> TODO: the localhost rule in the docker-compose still doesn't work on scratchgmodv1 + run_workflow_for_current_organism.instance_url = "http://localhost:{0}/sp/{1}_{2}/galaxy/".format( + run_workflow_for_current_organism.config["http_port"], + run_workflow_for_current_organism.genus_lowercase, + run_workflow_for_current_organism.species) + + + if workflow_type == "load_fasta_gff_jbrowse": + run_workflow_for_current_organism.connect_to_instance() + + history_id = run_workflow_for_current_organism.set_get_history() + + run_workflow_for_current_organism.install_changesets_revisions_for_individual_tools() + ids = run_workflow_for_current_organism.add_organism_ogs_genome_analyses() + + org_id = None + genome_analysis_id = None + ogs_analysis_id = None + org_id = ids["org_id"] + genome_analysis_id = ids["genome_analysis_id"] + ogs_analysis_id = ids["ogs_analysis_id"] + instance_attributes = run_workflow_for_current_organism.get_instance_attributes() + hda_ids = run_workflow_for_current_organism.import_datasets_into_history() + + strain_sex = "{0}_{1}".format(run_workflow_for_current_organism.strain, run_workflow_for_current_organism.sex) + genus_species = run_workflow_for_current_organism.genus_species + + # Create the dictionary holding all attributes needed to connect to the galaxy instance + attributes = {"genus": run_workflow_for_current_organism.genus, + "species": run_workflow_for_current_organism.species, + "genus_species": run_workflow_for_current_organism.genus_species, + "full_name": run_workflow_for_current_organism.full_name, + "species_folder_name": run_workflow_for_current_organism.species_folder_name, + "sex": run_workflow_for_current_organism.sex, + "strain": run_workflow_for_current_organism.strain, + "org_id": org_id, + "genome_analysis_id": genome_analysis_id, + "ogs_analysis_id": ogs_analysis_id, + "instance_attributes": instance_attributes, + "hda_ids": hda_ids, + "history_id": history_id, + "instance": run_workflow_for_current_organism.instance, + "instance_url": run_workflow_for_current_organism.instance_url, + "email": config["galaxy_default_admin_email"], + "password": config["galaxy_default_admin_password"]} + + sp_workflow_dict[genus_species] = {strain_sex: attributes} + + else: + logging.critical("The galaxy container for %s is not ready yet!" % run_workflow_for_current_organism.full_name) + sys.exit() + + return sp_workflow_dict + + if workflow_type == "blast": + run_workflow_for_current_organism.connect_to_instance() + + history_id = run_workflow_for_current_organism.set_get_history() + + run_workflow_for_current_organism.install_changesets_revisions_for_individual_tools() + ids = run_workflow_for_current_organism.add_organism_blastp_analysis() + + org_id = None + org_id = ids["org_id"] + blastp_analysis_id = None + blastp_analysis_id = ids["blastp_analysis_id"] + instance_attributes = run_workflow_for_current_organism.get_instance_attributes() + hda_ids = run_workflow_for_current_organism.import_datasets_into_history() + + strain_sex = "{0}_{1}".format(run_workflow_for_current_organism.strain, run_workflow_for_current_organism.sex) + genus_species = run_workflow_for_current_organism.genus_species + + # Create the dictionary holding all attributes needed to connect to the galaxy instance + attributes = {"genus": run_workflow_for_current_organism.genus, + "species": run_workflow_for_current_organism.species, + "genus_species": run_workflow_for_current_organism.genus_species, + "full_name": run_workflow_for_current_organism.full_name, + "species_folder_name": run_workflow_for_current_organism.species_folder_name, + "sex": run_workflow_for_current_organism.sex, + "strain": run_workflow_for_current_organism.strain, + "org_id": org_id, + "blastp_analysis_id": blastp_analysis_id, + "instance_attributes": instance_attributes, + "hda_ids": hda_ids, + "history_id": history_id, + "instance": run_workflow_for_current_organism.instance, + "instance_url": run_workflow_for_current_organism.instance_url, + "email": config["galaxy_default_admin_email"], + "password": config["galaxy_default_admin_password"]} + + sp_workflow_dict[genus_species] = {strain_sex: attributes} + + else: + logging.critical("The galaxy container for %s is not ready yet!" % run_workflow_for_current_organism.full_name) + sys.exit() + + + +def install_changesets_revisions_from_workflow(instance, workflow_path): + """ + Read a .ga file to extract the information about the different tools called. + Check if every tool is installed via a "show_tool". + If a tool is not installed (versions don't match), send a warning to the logger and install the required changeset (matching the tool version) + Doesn't do anything if versions match + + :return: + """ + + logging.info("Validating that installed tools versions and changesets match workflow versions") + + # Load the workflow file (.ga) in a buffer + with open(workflow_path, 'r') as ga_in_file: + + # Then store the decoded json dictionary + workflow_dict = json.load(ga_in_file) + + # Look up every "step_id" looking for tools + for k, v in workflow_dict["steps"].items(): + if v["tool_id"]: + # Get the descriptive dictionary of the installed tool (using the tool id in the workflow) + show_tool = instance.tools.show_tool(v["tool_id"]) + # Check if an installed version matches the workflow tool version + # (If it's not installed, the show_tool version returned will be a default version with the suffix "XXXX+0") + if show_tool["version"] != v["tool_version"]: + # If it doesn't match, proceed to install of the correct changeset revision + toolshed = "https://" + v["tool_shed_repository"]["tool_shed"] + name = v["tool_shed_repository"]["name"] + owner = v["tool_shed_repository"]["owner"] + changeset_revision = v["tool_shed_repository"]["changeset_revision"] + + logging.warning("Installed tool versions for tool {0} do not match the version required by the specified workflow, installing changeset {1}".format(name, changeset_revision)) + + # Install changeset + instance.toolshed.install_repository_revision(tool_shed_url=toolshed, name=name, owner=owner, + changeset_revision=changeset_revision, + install_tool_dependencies=True, + install_repository_dependencies=False, + install_resolver_dependencies=True) + else: + toolshed = "https://" + v["tool_shed_repository"]["tool_shed"] + name = v["tool_shed_repository"]["name"] + owner = v["tool_shed_repository"]["owner"] + changeset_revision = v["tool_shed_repository"]["changeset_revision"] + logging.debug("Installed tool versions for tool {0} match the version in the specified workflow (changeset {1})".format(name, changeset_revision)) + + logging.info("Tools versions and changesets from workflow validated") if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Automatic data loading in containers and interaction " - "with galaxy instances for GGA" - ", following the protocol @ " - "http://gitlab.sb-roscoff.fr/abims/e-infra/gga") + parser = argparse.ArgumentParser(description="Run Galaxy workflows, specific to Phaeoexplorer data") parser.add_argument("input", type=str, @@ -802,7 +1002,7 @@ if __name__ == "__main__": parser.add_argument("--workflow", "-w", type=str, - help="Worfklow to run") + help="Worfklow to run. Available options: load_fasta_gff_jbrowse, blast, interpro") args = parser.parse_args() @@ -826,242 +1026,699 @@ if __name__ == "__main__": sp_dict_list = utilities.parse_input(args.input) - for sp_dict in sp_dict_list: - - # Creating an instance of the RunWorkflow object for the current organism - run_workflow_for_current_organism = RunWorkflow(parameters_dictionary=sp_dict) - - # Checking if user specified a workflow to run - if not args.workflow: - logging.critical("No workflow specified, exiting") - sys.exit() - else: - workflow = os.path.abspath(args.workflow) - - # Verifying the galaxy container is running - if utilities.check_galaxy_state(genus_lowercase=run_workflow_for_current_organism.genus_lowercase, - species=run_workflow_for_current_organism.species, - script_dir=run_workflow_for_current_organism.script_dir): - - # Starting - logging.info("run_workflow.py called for %s" % run_workflow_for_current_organism.full_name) - - # Setting some of the instance attributes - run_workflow_for_current_organism.main_dir = args.main_directory - run_workflow_for_current_organism.species_dir = os.path.join(run_workflow_for_current_organism.main_dir, - run_workflow_for_current_organism.genus_species + - "/") + workflow_valid_types = ["load_fasta_gff_jbrowse", "blast", "interpro"] - # Parse the config yaml file - run_workflow_for_current_organism.config = utilities.parse_config(args.config) - # Set the instance url attribute --> TODO: the localhost rule in the docker-compose still doesn't work on scratchgmodv1 - run_workflow_for_current_organism.instance_url = "http://localhost:{0}/sp/{1}_{2}/galaxy/".format( - run_workflow_for_current_organism.config["http_port"], - run_workflow_for_current_organism.genus_lowercase, - run_workflow_for_current_organism.species) + workflow_type = None + # Checking if user specified a workflow to run + if not args.workflow: + logging.critical("No workflow type specified, exiting") + sys.exit() + elif args.workflow in workflow_valid_types: + workflow_type = args.workflow + logging.info("Workflow type set to %s" % workflow_type) + script_dir = os.path.dirname(os.path.realpath(sys.argv[0])) + config = utilities.parse_config(args.config) + all_sp_workflow_dict = {} - # If input workflow is Chado_load_Tripal_synchronize.ga - if "Chado_load_Tripal_synchronize" in str(workflow): - logging.info("Executing workflow 'Chado_load_Tripal_synchronize'") + if workflow_type == "load_fasta_gff_jbrowse": + for sp_dict in sp_dict_list: - run_workflow_for_current_organism.connect_to_instance() - run_workflow_for_current_organism.set_get_history() - # run_workflow_for_current_organism.get_species_history_id() + # Add and retrieve all analyses/organisms for the current input species and add their IDs to the input dictionary + current_sp_workflow_dict = create_sp_workflow_dict(sp_dict, main_dir=args.main_directory, config=config, workflow_type="load_fasta_gff_jbrowse") - run_workflow_for_current_organism.install_changesets_revisions_for_individual_tools() - run_workflow_for_current_organism.install_changesets_revisions_from_workflow(workflow_path=workflow) - run_workflow_for_current_organism.add_organism_ogs_genome_analyses() - run_workflow_for_current_organism.get_organism_id() - run_workflow_for_current_organism.get_genome_analysis_id() - run_workflow_for_current_organism.get_ogs_analysis_id() - - # run_workflow_for_current_organism.tripal_synchronize_organism_analyses() - - # Get the attributes of the instance and project data files - run_workflow_for_current_organism.get_instance_attributes() - - # Import datasets into history and retrieve their hda IDs - # TODO: can be simplified with direct access to the folder contents via the full path (no loop required) - hda_ids = run_workflow_for_current_organism.import_datasets_into_history() - - # DEBUG - # run_workflow_for_current_organism.get_invocation_report(workflow_name="Chado load Tripal synchronize") - - # Explicit workflow parameter names - GENOME_FASTA_FILE = "0" - GFF_FILE = "1" - PROTEINS_FASTA_FILE = "2" - TRANSCRIPTS_FASTA_FILE = "3" - - LOAD_FASTA_IN_CHADO = "4" - LOAD_GFF_IN_CHADO = "5" - SYNC_ORGANISM_INTO_TRIPAL = "6" - SYNC_GENOME_ANALYSIS_INTO_TRIPAL = "7" - SYNC_OGS_ANALYSIS_INTO_TRIPAL = "8" - SYNC_FEATURES_INTO_TRIPAL = "9" - - workflow_parameters = {} - - workflow_parameters[GENOME_FASTA_FILE] = {} - workflow_parameters[GFF_FILE] = {} - workflow_parameters[PROTEINS_FASTA_FILE] = {} - workflow_parameters[TRANSCRIPTS_FASTA_FILE] = {} - - workflow_parameters[LOAD_FASTA_IN_CHADO] = {"organism": run_workflow_for_current_organism.org_id, - "analysis_id": run_workflow_for_current_organism.genome_analysis_id, - "do_update": "true"} - # Change "do_update": "true" to "do_update": "false" in above parameters to prevent appending/updates to the fasta file in chado - # WARNING: It is safer to never update it and just change the genome/ogs versions in the config - workflow_parameters[LOAD_GFF_IN_CHADO] = {"organism": run_workflow_for_current_organism.org_id, - "analysis_id": run_workflow_for_current_organism.ogs_analysis_id} - workflow_parameters[SYNC_ORGANISM_INTO_TRIPAL] = {"organism_id": run_workflow_for_current_organism.org_id} - workflow_parameters[SYNC_GENOME_ANALYSIS_INTO_TRIPAL] = {"analysis_id": run_workflow_for_current_organism.ogs_analysis_id} - workflow_parameters[SYNC_OGS_ANALYSIS_INTO_TRIPAL] = {"analysis_id": run_workflow_for_current_organism.genome_analysis_id} - workflow_parameters[SYNC_FEATURES_INTO_TRIPAL] = {"organism_id": run_workflow_for_current_organism.org_id} - - # Datamap for input datasets - dataset source (type): ldda (LibraryDatasetDatasetAssociation) - run_workflow_for_current_organism.datamap = {} - run_workflow_for_current_organism.datamap[GENOME_FASTA_FILE] = {"src": "hda", "id": hda_ids["genome_hda_id"]} - run_workflow_for_current_organism.datamap[GFF_FILE] = {"src": "hda", "id": hda_ids["gff_hda_id"]} - run_workflow_for_current_organism.datamap[PROTEINS_FASTA_FILE] = {"src": "hda", "id": hda_ids["proteins_hda_id"]} - run_workflow_for_current_organism.datamap[TRANSCRIPTS_FASTA_FILE] = {"src": "hda", "id": hda_ids["transcripts_hda_id"]} - - # run_workflow_for_current_organism.datamap = {} - # run_workflow_for_current_organism.datamap[GENOME_FASTA_FILE] = {"src": "hda", "id": - # run_workflow_for_current_organism.datasets["genome_file"]} - # run_workflow_for_current_organism.datamap[GFF_FILE] = {"src": "hda", - # "id": hda_ids["gff_hda_id"]} - - # Ensures galaxy has had time to retrieve datasets - time.sleep(60) - # Run the Chado load Tripal sync workflow with the parameters set above - run_workflow_for_current_organism.run_workflow(workflow_path=workflow, - workflow_parameters=workflow_parameters, - datamap=run_workflow_for_current_organism.datamap, - workflow_name="Chado load Tripal synchronize") - - # Jbrowse creation workflow - elif "Jbrowse" in str(workflow): - - logging.info("Executing workflow 'Jbrowse'") - - run_workflow_for_current_organism.connect_to_instance() - run_workflow_for_current_organism.set_get_history() - run_workflow_for_current_organism.install_changesets_revisions_from_workflow(workflow_path=workflow) - run_workflow_for_current_organism.get_organism_id() - # Import datasets into history and get their hda IDs - run_workflow_for_current_organism.import_datasets_into_history() - hda_ids = run_workflow_for_current_organism.get_datasets_hda_ids() # Note: only call this function AFTER calling "import_datasets_into_history()" - - # Debugging - # run_workflow_for_current_organism.get_invocation_report(workflow_name="Jbrowse") - - GENOME_FASTA_FILE = "0" - GFF_FILE = "1" - ADD_JBROWSE = "2" - ADD_ORGANISM_TO_JBROWSE = "3" + current_sp_key = list(current_sp_workflow_dict.keys())[0] + current_sp_value = list(current_sp_workflow_dict.values())[0] + current_sp_strain_sex_key = list(current_sp_value.keys())[0] + current_sp_strain_sex_value = list(current_sp_value.values())[0] + # Add the species dictionary to the complete dictionary + # This dictionary contains every organism present in the input file + # Its structure is the following: + # {genus species: {strain1_sex1: {variables_key: variables_values}, strain1_sex2: {variables_key: variables_values}}} + if not current_sp_key in all_sp_workflow_dict.keys(): + all_sp_workflow_dict[current_sp_key] = current_sp_value + else: + all_sp_workflow_dict[current_sp_key][current_sp_strain_sex_key] = current_sp_strain_sex_value + + + for k, v in all_sp_workflow_dict.items(): + if len(list(v.keys())) == 1: + logging.info("Input organism %s: 1 species detected in input dictionary" % k) + + # Set workflow path (1 organism) + workflow_path = os.path.join(os.path.abspath(script_dir), "workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v4.ga") + + # Instance object required variables + instance_url, email, password = None, None, None + + # Set the galaxy instance variables + for k2, v2 in v.items(): + instance_url = v2["instance_url"] + email = v2["email"] + password = v2["password"] + + instance = galaxy.GalaxyInstance(url=instance_url, email=email, password=password) + + # Check if the versions of tools specified in the workflow are installed in galaxy + install_changesets_revisions_from_workflow(workflow_path=workflow_path, instance=instance) + + organism_key_name = list(v.keys()) + org_dict = v[organisms_key_names[0]] + + history_id = org_dict["history_id"] + + # Organism 1 attributes + org_genus = org_dict["genus"] + org_species = org_dict["species"] + org_genus_species = org_dict["genus_species"] + org_species_folder_name = org_dict["species_folder_name"] + org_full_name = org_dict["full_name"] + org_strain = org_dict["sex"] + org_sex = org_dict["strain"] + org_org_id = org_dict["org_id"] + org_genome_analysis_id = org_dict["genome_analysis_id"] + org_ogs_analysis_id = org_dict["ogs_analysis_id"] + org_genome_hda_id = org_dict["hda_ids"]["genome_hda_id"] + org_transcripts_hda_id = org_dict["hda_ids"]["transcripts_hda_id"] + org_proteins_hda_id = org_dict["hda_ids"]["proteins_hda_id"] + org_gff_hda_id = org_dict["hda_ids"]["gff_hda_id"] + + # Store these values into a dict for parameters logging/validation + org_parameters_dict = { + "org_genus": org_genus, + "org_species": org_species, + "org_genus_species": org_genus_species, + "org_species_folder_name": org_species_folder_name, + "org_full_name": org_full_name, + "org_strain": org_strain, + "org_sex": org_sex, + "org_org_id": org_org_id, + "org_genome_analysis_id": org_genome_analysis_id, + "org_ogs_analysis_id": org_ogs_analysis_id, + "org_genome_hda_id": org_genome_hda_id, + "org_transcripts_hda_id": org_transcripts_hda_id, + "org_proteins_hda_id": org_proteins_hda_id, + "org_gff_hda_id": org_gff_hda_id, + } + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org_full_name, param_name, param_value)) + sys.exit() + + # Set the workflow parameters (individual tools runtime parameters in the workflow) workflow_parameters = {} - workflow_parameters[GENOME_FASTA_FILE] = {} - workflow_parameters[GFF_FILE] = {} - workflow_parameters[ADD_JBROWSE] = {} - workflow_parameters[ADD_ORGANISM_TO_JBROWSE] = {} - - run_workflow_for_current_organism.datamap = {} - run_workflow_for_current_organism.datamap[GENOME_FASTA_FILE] = {"src": "hda", "id": hda_ids["genome_hda_id"]} - run_workflow_for_current_organism.datamap[GFF_FILE] = {"src": "hda", "id": hda_ids["gff_hda_id"]} - - # Run the jbrowse creation workflow - run_workflow_for_current_organism.run_workflow(workflow_path=workflow, - workflow_parameters=workflow_parameters, - datamap=run_workflow_for_current_organism.datamap, - workflow_name="Jbrowse") - - elif "Interpro" in str(workflow): - - logging.info("Executing workflow 'Interproscan") - - run_workflow_for_current_organism.connect_to_instance() - run_workflow_for_current_organism.set_get_history() - run_workflow_for_current_organism.install_changesets_revisions_from_workflow(workflow_path=workflow) - # run_workflow_for_current_organism.get_species_history_id() - - # Get the attributes of the instance and project data files - run_workflow_for_current_organism.get_instance_attributes() - run_workflow.add_interproscan_analysis() - run_workflow_for_current_organism.get_interpro_analysis_id() - - # Import datasets into history and retrieve their hda IDs - run_workflow_for_current_organism.import_datasets_into_history() - hda_ids = run_workflow_for_current_organism.get_datasets_hda_ids() - - INTERPRO_FILE = "0" - LOAD_INTERPRO_IN_CHADO = "1" - SYNC_INTERPRO_ANALYSIS_INTO_TRIPAL = "2" - SYNC_FEATURES_INTO_TRIPAL = "3" - POPULATE_MAT_VIEWS = "4" - INDEX_TRIPAL_DATA = "5" + GENOME_FASTA_FILE_ORG = "0" + GFF_FILE_ORG = "1" + PROTEINS_FASTA_FILE_ORG = "2" + LOAD_FASTA_ORG = "3" + JBROWSE_ORG = "4" + LOAD_GFF_ORG = "5" + JBROWSE_CONTAINER = "6" + SYNC_FEATURES_ORG = "7" + POPULATE_MAT_VIEWS = "8" + INDEX_TRIPAL_DATA = "9" + + # Input files have no parameters (they are set via assigning the hda IDs in the datamap parameter of the bioblend method) + workflow_parameters[GENOME_FASTA_FILE_ORG] = {} + workflow_parameters[GFF_FILE_ORG] = {} + workflow_parameters[PROTEINS_FASTA_FILE_ORG] = {} + workflow_parameters[LOAD_FASTA_ORG] = {"organism": org_org_id, + "analysis_id": org_genome_analysis_id, + "do_update": "true"} + workflow_parameters[JBROWSE_ORG] = {} + workflow_parameters[LOAD_GFF_ORG] = {"organism": org_org_id, "analysis_id": org_ogs_analysis_id} + workflow_parameters[SYNC_FEATURES_ORG] = {"organism_id": org_org_id} + # POPULATE + INDEX DATA + workflow_parameters[POPULATE_MAT_VIEWS] = {} + workflow_parameters[INDEX_TRIPAL_DATA] = {} + + # Set datamap (mapping of input files in the workflow) + datamap = {} + + datamap[GENOME_FASTA_FILE_ORG] = {"src": "hda", "id": org_genome_hda_id} + datamap[GFF_FILE_ORG] = {"src": "hda", "id": org_gff_hda_id} + datamap[PROTEINS_FASTA_FILE_ORG] = {"src": "hda", "id": org_proteins_hda_id} + + + with open(workflow_path, 'r') as ga_in_file: + + # Store the decoded json dictionary + workflow_dict = json.load(ga_in_file) + workflow_name = workflow_dict["name"] + + # For the Jbrowse tool, we unfortunately have to manually edit the parameters instead of setting them + # as runtime values, using runtime parameters makes the tool throw an internal critical error ("replace not found" error) + # Scratchgmod test: need "http" (or "https"), the hostname (+ port) + jbrowse_menu_url_org = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org_genus_species, Genus=org_genus[0].upper() + org_genus[1:], species=org_species, id="{id}") + if "jbrowse_menu_url" not in config.keys(): + jbrowse_menu_url_org = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org_genus_species, Genus=org_genus[0].upper() + org_genus[1:], species=org_species, id="{id}") + else: + jbrowse_menu_url_org = config["jbrowse_menu_url"] + "/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(genus_sp=org_genus_species, Genus=org_genus[0].upper() + org_genus[1:], species=org_species, id="{id}") + + # show_tool_add_organism = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/2.3.4+galaxy0", io_details=True) + # print(show_tool_add_organism) + # show_jbrowse_tool = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", io_details=True) + # print(show_jbrowse_tool) + # show_jbrowse_container_tool = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", io_details=True) + # print(show_jbrowse_container_tool) + + # Replace values in the workflow dictionary + workflow_dict["steps"]["4"]["tool_state"] = workflow_dict["steps"]["4"]["tool_state"].replace("__MENU_URL_ORG__", jbrowse_menu_url_org) + workflow_dict["steps"]["6"]["tool_state"] = workflow_dict["steps"]["6"]["tool_state"].replace("__DISPLAY_NAME_ORG__", org_full_name).replace("__UNIQUE_ID_ORG__", org_species_folder_name) + + # Import the workflow in galaxy as a dict + instance.workflows.import_workflow_dict(workflow_dict=workflow_dict) + + # Get its attributes + workflow_attributes = instance.workflows.get_workflows(name=workflow_name) + # Then get its ID (required to invoke the workflow) + workflow_id = workflow_attributes[0]["id"] # Index 0 is the most recently imported workflow (the one we want) + show_workflow = instance.workflows.show_workflow(workflow_id=workflow_id) + # Check if the workflow is found + try: + logging.debug("Workflow ID: %s" % workflow_id) + except bioblend.ConnectionError: + logging.warning("Error finding workflow %s" % workflow_name) + + # Finally, invoke the workflow alogn with its datamap, parameters and the history in which to invoke it + instance.workflows.invoke_workflow(workflow_id=workflow_id, history_id=history_id, params=workflow_parameters, inputs=datamap, allow_tool_state_corrections=True) + + logging.info("Successfully imported and invoked workflow {0}, check the galaxy instance ({1}) for the jobs state".format(workflow_name, instance_url)) + + + if len(list(v.keys())) == 2: + + logging.info("Input organism %s: 2 species detected in input dictionary" % k) + + # Set workflow path (2 organisms) + workflow_path = os.path.join(os.path.abspath(script_dir), "workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v4.ga") + + # Instance object required variables + instance_url, email, password = None, None, None + + # Set the galaxy instance variables + for k2, v2 in v.items(): + instance_url = v2["instance_url"] + email = v2["email"] + password = v2["password"] + + instance = galaxy.GalaxyInstance(url=instance_url, email=email, password=password) + + # Check if the versions of tools specified in the workflow are installed in galaxy + install_changesets_revisions_from_workflow(workflow_path=workflow_path, instance=instance) + + # Get key names from the current organism (item 1 = organism 1, item 2 = organism 2) + organisms_key_names = list(v.keys()) + org1_dict = v[organisms_key_names[0]] + org2_dict = v[organisms_key_names[1]] + + history_id = org1_dict["history_id"] + + # Organism 1 attributes + org1_genus = org1_dict["genus"] + org1_species = org1_dict["species"] + org1_genus_species = org1_dict["genus_species"] + org1_species_folder_name = org1_dict["species_folder_name"] + org1_full_name = org1_dict["full_name"] + org1_strain = org1_dict["sex"] + org1_sex = org1_dict["strain"] + org1_org_id = org1_dict["org_id"] + org1_genome_analysis_id = org1_dict["genome_analysis_id"] + org1_ogs_analysis_id = org1_dict["ogs_analysis_id"] + org1_genome_hda_id = org1_dict["hda_ids"]["genome_hda_id"] + org1_transcripts_hda_id = org1_dict["hda_ids"]["transcripts_hda_id"] + org1_proteins_hda_id = org1_dict["hda_ids"]["proteins_hda_id"] + org1_gff_hda_id = org1_dict["hda_ids"]["gff_hda_id"] + + # Store these values into a dict for parameters logging/validation + org1_parameters_dict = { + "org1_genus": org1_genus, + "org1_species": org1_species, + "org1_genus_species": org1_genus_species, + "org1_species_folder_name": org1_species_folder_name, + "org1_full_name": org1_full_name, + "org1_strain": org1_strain, + "org1_sex": org1_sex, + "org1_org_id": org1_org_id, + "org1_genome_analysis_id": org1_genome_analysis_id, + "org1_ogs_analysis_id": org1_ogs_analysis_id, + "org1_genome_hda_id": org1_genome_hda_id, + "org1_transcripts_hda_id": org1_transcripts_hda_id, + "org1_proteins_hda_id": org1_proteins_hda_id, + "org1_gff_hda_id": org1_gff_hda_id, + } + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org1_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org1_full_name, param_name, param_value)) + sys.exit() + + # Organism 2 attributes + org2_genus = org2_dict["genus"] + org2_species = org2_dict["species"] + org2_genus_species = org2_dict["genus_species"] + org2_species_folder_name = org2_dict["species_folder_name"] + org2_full_name = org2_dict["full_name"] + org2_strain = org2_dict["sex"] + org2_sex = org2_dict["strain"] + org2_org_id = org2_dict["org_id"] + org2_genome_analysis_id = org2_dict["genome_analysis_id"] + org2_ogs_analysis_id = org2_dict["ogs_analysis_id"] + org2_genome_hda_id = org2_dict["hda_ids"]["genome_hda_id"] + org2_transcripts_hda_id = org2_dict["hda_ids"]["transcripts_hda_id"] + org2_proteins_hda_id = org2_dict["hda_ids"]["proteins_hda_id"] + org2_gff_hda_id = org2_dict["hda_ids"]["gff_hda_id"] + + # Store these values into a dict for parameters logging/validation + org2_parameters_dict = { + "org2_genus": org2_genus, + "org2_species": org2_species, + "org2_genus_species": org2_genus_species, + "org2_species_folder_name": org2_species_folder_name, + "org2_full_name": org2_full_name, + "org2_strain": org2_strain, + "org2_sex": org2_sex, + "org2_org_id": org2_org_id, + "org2_genome_analysis_id": org2_genome_analysis_id, + "org2_ogs_analysis_id": org2_ogs_analysis_id, + "org2_genome_hda_id": org2_genome_hda_id, + "org2_transcripts_hda_id": org2_transcripts_hda_id, + "org2_proteins_hda_id": org2_proteins_hda_id, + "org2_gff_hda_id": org2_gff_hda_id, + } + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org2_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org2_full_name, param_name, param_value)) + sys.exit() + + # Source files association (ordered by their IDs in the workflow) + # WARNING: Be very careful about how the workflow is "organized" (i.e the order of the steps/datasets, check the .ga if there is any error) + GFF_FILE_ORG1 = "0" + GENOME_FASTA_FILE_ORG1 = "1" + PROTEINS_FASTA_FILE_ORG1 = "2" + + GENOME_FASTA_FILE_ORG2 = "3" + GFF_FILE_ORG2 = "4" + PROTEINS_FASTA_FILE_ORG2 = "5" + + LOAD_FASTA_ORG1 = "6" + JBROWSE_ORG1 = "7" + JRBOWSE_ORG2 = "8" + + LOAD_GFF_ORG1 = "9" + JBROWSE_CONTAINER = "10" + SYNC_FEATURES_ORG1 = "11" + + LOAD_FASTA_ORG2 = "12" + LOAD_GFF_ORG2 = "13" + + SYNC_FEATURES_ORG2 = "14" + POPULATE_MAT_VIEWS = "15" + INDEX_TRIPAL_DATA = "16" + + # Set the workflow parameters (individual tools runtime parameters in the workflow) workflow_parameters = {} - workflow_parameters[INTERPRO_FILE] = {} - workflow_parameters[LOAD_INTERPRO_IN_CHADO] = {"organism": run_workflow_for_current_organism.org_id, - "analysis_id": run_workflow_for_current_organism.interpro_analysis_id} - workflow_parameters[SYNC_INTERPRO_ANALYSIS_INTO_TRIPAL] = {"analysis_id": run_workflow_for_current_organism.interpro_analysis_id} - - - run_workflow_for_current_organism.datamap = {} - run_workflow_for_current_organism.datamap[INTERPRO_FILE] = {"src": "hda", "id": run_workflow_for_current_organism.hda_ids["interproscan_hda_id"]} - - # Run Interproscan workflow - run_workflow_for_current_organism.run_workflow(workflow_path=workflow, - workflow_parameters=workflow_parameters, - datamap=run_workflow_for_current_organism.datamap, - workflow_name="Interproscan") - - elif "Blast" in str(workflow): - - logging.info("Executing workflow 'Blast_Diamond") - - run_workflow_for_current_organism.connect_to_instance() - run_workflow_for_current_organism.set_get_history() - run_workflow_for_current_organism.install_changesets_revisions_from_workflow(workflow_path=workflow) - # run_workflow_for_current_organism.get_species_history_id() - - # Get the attributes of the instance and project data files - run_workflow_for_current_organism.get_instance_attributes() - run_workflow_for_current_organism.add_blastp_diamond_analysis() - run_workflow_for_current_organism.get_blastp_diamond_analysis_id() - - # Import datasets into history and retrieve their hda IDs - run_workflow_for_current_organism.import_datasets_into_history() - hda_ids = run_workflow_for_current_organism.get_datasets_hda_ids() - - BLAST_FILE = "0" - LOAD_BLAST_IN_CHADO = "1" - SYNC_BLAST_ANALYSIS_INTO_TRIPAL = "2" - SYNC_FEATURES_INTO_TRIPAL = "3" - POPULATE_MAT_VIEWS = "4" - INDEX_TRIPAL_DATA = "5" - - workflow_parameters = {} - workflow_parameters[INTERPRO_FILE] = {} - workflow_parameters[LOAD_BLAST_IN_CHADO] = {"organism": run_workflow_for_current_organism.org_id, - "analysis_id": run_workflow_for_current_organism.blast_diamond_analysis_id} - workflow_parameters[SYNC_BLAST_ANALYSIS_INTO_TRIPAL] = {"analysis_id": run_workflow_for_current_organism.blast_diamond_analysis_id} - - run_workflow_for_current_organism.datamap = {} - run_workflow_for_current_organism.datamap[INTERPRO_FILE] = {"src": "hda", "id": hda_ids["interproscan_hda_id"]} - - # Run Interproscan workflow - run_workflow_for_current_organism.run_workflow(workflow_path=workflow, - workflow_parameters=workflow_parameters, - datamap=run_workflow_for_current_organism.datamap, - workflow_name="Interproscan") + # Input files have no parameters (they are set via assigning the hda IDs in the datamap parameter of the bioblend method) + workflow_parameters[GENOME_FASTA_FILE_ORG1] = {} + workflow_parameters[GFF_FILE_ORG1] = {} + workflow_parameters[PROTEINS_FASTA_FILE_ORG1] = {} + workflow_parameters[GENOME_FASTA_FILE_ORG2] = {} + workflow_parameters[GFF_FILE_ORG2] = {} + workflow_parameters[PROTEINS_FASTA_FILE_ORG2] = {} + + # Organism 1 + workflow_parameters[LOAD_FASTA_ORG1] = {"organism": org1_org_id, + "analysis_id": org1_genome_analysis_id, + "do_update": "true"} + # workflow_parameters[JBROWSE_ORG1] = {"jbrowse_menu_url": jbrowse_menu_url_org1} + workflow_parameters[JBROWSE_ORG1] = {} + workflow_parameters[LOAD_GFF_ORG1] = {"organism": org1_org_id, "analysis_id": org1_ogs_analysis_id} + workflow_parameters[SYNC_FEATURES_ORG1] = {"organism_id": org1_org_id} + # workflow_parameters[JBROWSE_CONTAINER] = {"organisms": [{"name": org1_full_name, "unique_id": org1_species_folder_name, }, {"name": org2_full_name, "unique_id": org2_species_folder_name}]} + workflow_parameters[JBROWSE_CONTAINER] = {} + + # Organism 2 + workflow_parameters[LOAD_FASTA_ORG2] = {"organism": org2_org_id, + "analysis_id": org2_genome_analysis_id, + "do_update": "true"} + workflow_parameters[LOAD_GFF_ORG2] = {"organism": org2_org_id, "analysis_id": org2_ogs_analysis_id} + # workflow_parameters[JRBOWSE_ORG2] = {"jbrowse_menu_url": jbrowse_menu_url_org2} + workflow_parameters[JRBOWSE_ORG2] = {} + workflow_parameters[SYNC_FEATURES_ORG2] = {"organism_id": org2_org_id} + + + # POPULATE + INDEX DATA + workflow_parameters[POPULATE_MAT_VIEWS] = {} + workflow_parameters[INDEX_TRIPAL_DATA] = {} + + + # Set datamap (mapping of input files in the workflow) + datamap = {} + + # Organism 1 + datamap[GENOME_FASTA_FILE_ORG1] = {"src": "hda", "id": org1_genome_hda_id} + datamap[GFF_FILE_ORG1] = {"src": "hda", "id": org1_gff_hda_id} + datamap[PROTEINS_FASTA_FILE_ORG1] = {"src": "hda", "id": org1_proteins_hda_id} + + # Organism 2 + datamap[GENOME_FASTA_FILE_ORG2] = {"src": "hda", "id": org2_genome_hda_id} + datamap[GFF_FILE_ORG2] = {"src": "hda", "id": org2_gff_hda_id} + datamap[PROTEINS_FASTA_FILE_ORG2] = {"src": "hda", "id": org2_proteins_hda_id} + + with open(workflow_path, 'r') as ga_in_file: + + # Store the decoded json dictionary + workflow_dict = json.load(ga_in_file) + workflow_name = workflow_dict["name"] + + # For the Jbrowse tool, we unfortunately have to manually edit the parameters instead of setting them + # as runtime values, using runtime parameters makes the tool throw an internal critical error ("replace not found" error) + # Scratchgmod test: need "http" (or "https"), the hostname (+ port) + jbrowse_menu_url_org1 = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org1_genus_species, Genus=org1_genus[0].upper() + org1_genus[1:], species=org1_species, id="{id}") + jbrowse_menu_url_org2 = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org2_genus_species, Genus=org2_genus[0].upper() + org2_genus[1:], species=org2_species, id="{id}") + if "jbrowse_menu_url" not in config.keys(): + jbrowse_menu_url_org1 = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org1_genus_species, Genus=org1_genus[0].upper() + org1_genus[1:], species=org1_species, id="{id}") + jbrowse_menu_url_org2 = "https://{hostname}/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(hostname=config["hostname"], genus_sp=org2_genus_species, Genus=org2_genus[0].upper() + org2_genus[1:], species=org2_species, id="{id}") + else: + jbrowse_menu_url_org1 = config["jbrowse_menu_url"] + "/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(genus_sp=org1_genus_species, Genus=org1_genus[0].upper() + org1_genus[1:], species=org1_species, id="{id}") + jbrowse_menu_url_org2 = config["jbrowse_menu_url"] + "/sp/{genus_sp}/feature/{Genus}/{species}/mRNA/{id}".format(genus_sp=org2_genus_species, Genus=org2_genus[0].upper() + org2_genus[1:], species=org2_species, id="{id}") + + # show_tool_add_organism = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_add_organism/organism_add_organism/2.3.4+galaxy0", io_details=True) + # print(show_tool_add_organism) + # show_jbrowse_tool = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", io_details=True) + # print(show_jbrowse_tool) + # show_jbrowse_container_tool = instance.tools.show_tool(tool_id="toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", io_details=True) + # print(show_jbrowse_container_tool) + + # Replace values in the workflow dictionary + workflow_dict["steps"]["7"]["tool_state"] = workflow_dict["steps"]["7"]["tool_state"].replace("__MENU_URL_ORG1__", jbrowse_menu_url_org1) + workflow_dict["steps"]["8"]["tool_state"] = workflow_dict["steps"]["8"]["tool_state"].replace("__MENU_URL_ORG2__", jbrowse_menu_url_org2) + # The UNIQUE_ID is specific to a combination genus_species_strain_sex so every combination should have its unique workflow + # in galaxy --> define a naming method for these workflows + workflow_dict["steps"]["10"]["tool_state"] = workflow_dict["steps"]["10"]["tool_state"].replace("__DISPLAY_NAME_ORG1__", org1_full_name).replace("__UNIQUE_ID_ORG1__", org1_species_folder_name) + workflow_dict["steps"]["10"]["tool_state"] = workflow_dict["steps"]["10"]["tool_state"].replace("__DISPLAY_NAME_ORG2__", org2_full_name).replace("__UNIQUE_ID_ORG2__", org2_species_folder_name) + + # Import the workflow in galaxy as a dict + instance.workflows.import_workflow_dict(workflow_dict=workflow_dict) + + # Get its attributes + workflow_attributes = instance.workflows.get_workflows(name=workflow_name) + # Then get its ID (required to invoke the workflow) + workflow_id = workflow_attributes[0]["id"] # Index 0 is the most recently imported workflow (the one we want) + show_workflow = instance.workflows.show_workflow(workflow_id=workflow_id) + # Check if the workflow is found + try: + logging.debug("Workflow ID: %s" % workflow_id) + except bioblend.ConnectionError: + logging.warning("Error finding workflow %s" % workflow_name) + + # Finally, invoke the workflow alogn with its datamap, parameters and the history in which to invoke it + instance.workflows.invoke_workflow(workflow_id=workflow_id, history_id=history_id, params=workflow_parameters, inputs=datamap, allow_tool_state_corrections=True) + + logging.info("Successfully imported and invoked workflow {0}, check the galaxy instance ({1}) for the jobs state".format(workflow_name, instance_url)) + + if workflow_type == "blast": + for sp_dict in sp_dict_list: + + # Add and retrieve all analyses/organisms for the current input species and add their IDs to the input dictionary + current_sp_workflow_dict = create_sp_workflow_dict(sp_dict, main_dir=args.main_directory, config=config, workfow_type="blast") + + current_sp_key = list(current_sp_workflow_dict.keys())[0] + current_sp_value = list(current_sp_workflow_dict.values())[0] + current_sp_strain_sex_key = list(current_sp_value.keys())[0] + current_sp_strain_sex_value = list(current_sp_value.values())[0] + + # Add the species dictionary to the complete dictionary + # This dictionary contains every organism present in the input file + # Its structure is the following: + # {genus species: {strain1_sex1: {variables_key: variables_values}, strain1_sex2: {variables_key: variables_values}}} + if not current_sp_key in all_sp_workflow_dict.keys(): + all_sp_workflow_dict[current_sp_key] = current_sp_value + else: + all_sp_workflow_dict[current_sp_key][current_sp_strain_sex_key] = current_sp_strain_sex_value + + if len(list(v.keys())) == 1: + logging.info("Input organism %s: 1 species detected in input dictionary" % k) + + # Set workflow path (1 organism) + workflow_path = os.path.join(os.path.abspath(script_dir), "workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_1org_v1.ga") + + # Instance object required variables + instance_url, email, password = None, None, None + + # Set the galaxy instance variables + for k2, v2 in v.items(): + instance_url = v2["instance_url"] + email = v2["email"] + password = v2["password"] + + instance = galaxy.GalaxyInstance(url=instance_url, email=email, password=password) + + # Check if the versions of tools specified in the workflow are installed in galaxy + install_changesets_revisions_from_workflow(workflow_path=workflow_path, instance=instance) + + organism_key_name = list(v.keys()) + org_dict = v[organisms_key_names[0]] + + history_id = org_dict["history_id"] + + # Organism attributes + org_genus = org_dict["genus"] + org_species = org_dict["species"] + org_genus_species = org_dict["genus_species"] + org_species_folder_name = org_dict["species_folder_name"] + org_full_name = org_dict["full_name"] + org_strain = org_dict["sex"] + org_sex = org_dict["strain"] + org_org_id = org_dict["org_id"] + org_blastp_analysis_id = org_dict["blastp_analysis_id"] + org_blastp_hda_id = org_dict["hda_ids"]["blastp_hda_id"] + + # Store these values into a dict for parameters logging/validation + org_parameters_dict = { + "org_genus": org_genus, + "org_species": org_species, + "org_genus_species": org_genus_species, + "org_species_folder_name": org_species_folder_name, + "org_full_name": org_full_name, + "org_strain": org_strain, + "org_sex": org_sex, + "org_org_id": org_org_id, + "org_blast_analysis_id": org_blastp_analysis_id, + "org_blastp_hda_id": org_blastp_hda_id, + } + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org_full_name, param_name, param_value)) + sys.exit() + + BLASTP_FILE = "0" + LOAD_BLASTP_FILE = "1" + SYNC_BLASTP_ANALYSIS = "2" + POPULATE_MAT_VIEWS = "3" + INDEX_TRIPAL_DATA = "4" + + # Set the workflow parameters (individual tools runtime parameters in the workflow) + workflow_parameters = {} + workflow_parameters[BLASTP_FILE] = {} + workflow_parameters[LOAD_BLASTP_FILE] = {"analysis_id": org_blastp_analysis_id, "organism_id": org_org_id} + workflow_parameters[SYNC_BLASTP_ANALYSIS] = {"analysis_id": org_blastp_analysis_id} + workflow_parameters[POPULATE_MAT_VIEWS] = {} + workflow_parameters[INDEX_TRIPAL_DATA] = {} + + datamap = {} + datamap[BLASTP_FILE] = {"src": "hda", "id": org_blastp_hda_id} + + with open(workflow_path, 'r') as ga_in_file: + # Store the decoded json dictionary + workflow_dict = json.load(ga_in_file) + workflow_name = workflow_dict["name"] + + # Import the workflow in galaxy as a dict + instance.workflows.import_workflow_dict(workflow_dict=workflow_dict) + # Get its attributes + workflow_attributes = instance.workflows.get_workflows(name=workflow_name) + # Then get its ID (required to invoke the workflow) + workflow_id = workflow_attributes[0]["id"] # Index 0 is the most recently imported workflow (the one we want) + show_workflow = instance.workflows.show_workflow(workflow_id=workflow_id) + # Check if the workflow is found + try: + logging.debug("Workflow ID: %s" % workflow_id) + except bioblend.ConnectionError: + logging.warning("Error finding workflow %s" % workflow_name) + + # Finally, invoke the workflow alogn with its datamap, parameters and the history in which to invoke it + instance.workflows.invoke_workflow(workflow_id=workflow_id, history_id=history_id, params=workflow_parameters, inputs=datamap, allow_tool_state_corrections=True) + + logging.info("Successfully imported and invoked workflow {0}, check the galaxy instance ({1}) for the jobs state".format(workflow_name, instance_url)) + + + + if len(list(v.keys())) == 2: + + logging.info("Input organism %s: 2 species detected in input dictionary" % k) + + # Set workflow path (2 organisms) + workflow_path = os.path.join(os.path.abspath(script_dir), "workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_2org_v1.ga") + + # Instance object required variables + instance_url, email, password = None, None, None + + # Set the galaxy instance variables + for k2, v2 in v.items(): + instance_url = v2["instance_url"] + email = v2["email"] + password = v2["password"] + + instance = galaxy.GalaxyInstance(url=instance_url, email=email, password=password) + + # Check if the versions of tools specified in the workflow are installed in galaxy + install_changesets_revisions_from_workflow(workflow_path=workflow_path, instance=instance) + + organisms_key_names = list(v.keys()) + org1_dict = v[organisms_key_names[0]] + org2_dict = v[organisms_key_names[1]] + + history_id = org1_dict["history_id"] + + # Organism 1 attributes + org1_genus = org1_dict["genus"] + org1_species = org1_dict["species"] + org1_genus_species = org1_dict["genus_species"] + org1_species_folder_name = org1_dict["species_folder_name"] + org1_full_name = org1_dict["full_name"] + org1_strain = org1_dict["sex"] + org1_sex = org1_dict["strain"] + org1_org_id = org1_dict["org_id"] + org1_blastp_analysis_id = org1_dict["blastp_analysis_id"] + org1_blastp_hda_id = org1_dict["hda_ids"]["blastp_hda_id"] + + # Store these values into a dict for parameters logging/validation + org1_parameters_dict = { + "org1_genus": org1_genus, + "org1_species": org1_species, + "org1_genus_species": org1_genus_species, + "org1_species_folder_name": org1_species_folder_name, + "org1_full_name": org1_full_name, + "org1_strain": org1_strain, + "org1_sex": org1_sex, + "org1_org_id": org1_org_id, + "org1_blast_analysis_id": org1_blastp_analysis_id, + "org1_blastp_hda_id": org1_blastp_hda_id, + } + + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org1_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org1_full_name, param_name, param_value)) + sys.exit() + + # Organism 2 attributes + org2_genus = org2_dict["genus"] + org2_species = org2_dict["species"] + org2_genus_species = org2_dict["genus_species"] + org2_species_folder_name = org2_dict["species_folder_name"] + org2_full_name = org2_dict["full_name"] + org2_strain = org2_dict["sex"] + org2_sex = org2_dict["strain"] + org2_org_id = org2_dict["org_id"] + org2_blastp_analysis_id = org2_dict["blastp_analysis_id"] + org2_blastp_hda_id = org2_dict["hda_ids"]["blastp_hda_id"] + + # Store these values into a dict for parameters logging/validation + org2_parameters_dict = { + "org2_genus": org2_genus, + "org2_species": org2_species, + "org2_genus_species": org2_genus_species, + "org2_species_folder_name": orgé_species_folder_name, + "org2_full_name": org2_full_name, + "org2_strain": org2_strain, + "org2_sex": org2_sex, + "org2_org_id": org2_org_id, + "org2_blast_analysis_id": org2_blastp_analysis_id, + "org2_blastp_hda_id": org2_blastp_hda_id, + } + + + # Look for empty parameters values, throw a critical error if a parameter value is invalid + for param_name, param_value in org2_parameters_dict.items(): + if param_value is None or param_value == "": + logging.critical("Empty parameter value found for organism {0} (parameter: {1}, parameter value: {2})".format(org2_full_name, param_name, param_value)) + sys.exit() + + # Source files association (ordered by their IDs in the workflow) + # WARNING: Be very careful about how the workflow is "organized" (i.e the order of the steps/datasets, check the .ga if there is any error) + BLASTP_FILE_ORG1 = "0" + BLASTP_FILE_ORG2 = "1" + LOAD_BLASTP_FILE_ORG1 = "2" + LOAD_BLASTP_FILE_ORG1 = "3" + SYNC_BLASTP_ANALYSIS_ORG1 = "4" + SYNC_BLASTP_ANALYSIS_ORG2 = "5" + POPULATE_MAT_VIEWS = "6" + INDEX_TRIPAL_DATA = "7" + + # Set the workflow parameters (individual tools runtime parameters in the workflow) + workflow_parameters = {} + + # Input files have no parameters (they are set via assigning the hda IDs in the datamap parameter of the bioblend method) + workflow_parameters[BLASTP_FILE_ORG1] = {} + workflow_parameters[BLASTP_FILE_ORG2] = {} + + # Organism 1 + workflow_parameters[LOAD_BLASTP_FILE_ORG1] = {"organism_id": org1_org_id, + "analysis_id": org1_blastp_analysis_id} + workflow_parameters[SYNC_BLASTP_ANALYSIS_ORG1] = {"analysis_id": org1_blastp_analysis_id} + + # Organism 2 + workflow_parameters[LOAD_BLASTP_FILE_ORG2] = {"organism_id": org2_org_id, + "analysis_id": org2_blastp_analysis_id} + workflow_parameters[SYNC_BLASTP_ANALYSIS_ORG2] = {"analysis_id": org2_blastp_analysis_id} + + workflow_parameters[POPULATE_MAT_VIEWS] = {} + workflow_parameters[INDEX_TRIPAL_DATA] = {} + + # Set datamap (mapping of input files in the workflow) + datamap = {} + + # Organism 1 + datamap[BLASTP_FILE_ORG1] = {"src": "hda", "id": org1_blastp_hda_id} + + # Organism 2 + datamap[BLASTP_FILE_ORG2] = {"src": "hda", "id": org2_blastp_hda_id} + + with open(workflow_path, 'r') as ga_in_file: + # Store the decoded json dictionary + workflow_dict = json.load(ga_in_file) + workflow_name = workflow_dict["name"] + + # Import the workflow in galaxy as a dict + instance.workflows.import_workflow_dict(workflow_dict=workflow_dict) + # Get its attributes + workflow_attributes = instance.workflows.get_workflows(name=workflow_name) + # Then get its ID (required to invoke the workflow) + workflow_id = workflow_attributes[0]["id"] # Index 0 is the most recently imported workflow (the one we want) + show_workflow = instance.workflows.show_workflow(workflow_id=workflow_id) + # Check if the workflow is found + try: + logging.debug("Workflow ID: %s" % workflow_id) + except bioblend.ConnectionError: + logging.warning("Error finding workflow %s" % workflow_name) + + # Finally, invoke the workflow alogn with its datamap, parameters and the history in which to invoke it + instance.workflows.invoke_workflow(workflow_id=workflow_id, history_id=history_id, params=workflow_parameters, inputs=datamap, allow_tool_state_corrections=True) + + logging.info("Successfully imported and invoked workflow {0}, check the galaxy instance ({1}) for the jobs state".format(workflow_name, instance_url)) - else: - logging.critical("The galaxy container for %s is not ready yet!" % run_workflow_for_current_organism.full_name) - sys.exit() diff --git a/serexec b/serexec index 7a38c06c22d13eef5f39a0d5844875639af8d902..d73d97ef33ee0d6492d671b4ded2b7640bd065b8 100755 --- a/serexec +++ b/serexec @@ -4,11 +4,8 @@ set -e SERVICE_NAME=$1; shift TASK_ID=$(docker service ps --filter 'desired-state=running' $SERVICE_NAME -q) -#we have only one node -#NODE_ID=$(docker inspect --format '{{ .NodeID }}' $TASK_ID) +#NODE_ID=$(docker inspect --format '{{ .NodeID }}' $TASK_ID) # if multiple nodes CONTAINER_ID=$(docker inspect --format '{{ .Status.ContainerStatus.ContainerID }}' $TASK_ID) -#we have only one node -#NODE_HOST=$(docker node inspect --format '{{ .Description.Hostname }}' $NODE_ID) -#we have only one node -#export DOCKER_HOST="ssh://$USER@$NODE_HOST" +#NODE_HOST=$(docker node inspect --format '{{ .Description.Hostname }}' $NODE_ID) # if multiple nodes +#export DOCKER_HOST="ssh://$USER@$NODE_HOST" # if multiple nodes docker exec -it $CONTAINER_ID "$@" diff --git a/speciesData.py b/speciesData.py index a2fcbdbfafabb742315fa73210b1c4f5fe102b7f..4d4b58aeb3c1f107ebc779f3a43e8bd7b1671042 100755 --- a/speciesData.py +++ b/speciesData.py @@ -4,10 +4,11 @@ import os import sys import utilities +import logging +import constants from _datetime import datetime - class SpeciesData: """ This class contains attributes and functions to interact with the galaxy container of the GGA environment @@ -15,61 +16,107 @@ class SpeciesData: """ + def get_species_dir(self): + + species_dir = None + if os.path.isdir(self.main_dir) and not self.genus_species is None: + species_dir = os.path.join(self.main_dir, self.genus_species) + else: + logging.error("Cannot set species dir with '{0}/{1}'".format(self.main_dir,self.genus_species)) + return species_dir + + def goto_species_dir(self): + """ + Go to the species directory (starting from the main dir) + + :return: + """ + + species_dir = self.get_species_dir() + try: + os.chdir(species_dir) + except OSError: + logging.critical("Cannot access %s" % species_dir) + sys.exit(0) + return 1 + + def clean_string(self, string): + if not string is None and string != "": + clean_string = string.replace(" ", "_").replace("-", "_").replace("(", "").replace(")", "").replace("'", "").strip() + return clean_string + else: + return string + def __init__(self, parameters_dictionary): - # self.config_dictionary = None self.parameters_dictionary = parameters_dictionary - self.species = parameters_dictionary["description"]["species"].replace("(", "_").replace(")", "_").replace("-", "_") - self.genus = parameters_dictionary["description"]["genus"].replace("(", "_").replace(")", "_").replace("-", "_") - self.strain = parameters_dictionary["description"]["strain"].replace("(", "_").replace(")", "_").replace("-", "_") - self.sex = parameters_dictionary["description"]["sex"].replace("(", "_").replace(")", "_").replace("-", "_") - self.common = parameters_dictionary["description"]["common_name"].replace("(", "_").replace(")", "_").replace("-", "_") + self.name = parameters_dictionary[constants.ORG_PARAM_NAME] + parameters_dictionary_description=parameters_dictionary[constants.ORG_PARAM_DESC] + parameters_dictionary_data = parameters_dictionary[constants.ORG_PARAM_DATA] + parameters_dictionary_services = parameters_dictionary[constants.ORG_PARAM_SERVICES] + + self.species = self.clean_string(parameters_dictionary_description[constants.ORG_PARAM_DESC_SPECIES]) + self.genus = self.clean_string(parameters_dictionary_description[constants.ORG_PARAM_DESC_GENUS]) + self.strain = self.clean_string(parameters_dictionary_description[constants.ORG_PARAM_DESC_STRAIN]) + self.sex = self.clean_string(parameters_dictionary_description[constants.ORG_PARAM_DESC_SEX]) + self.common = self.clean_string(parameters_dictionary_description[constants.ORG_PARAM_DESC_COMMON_NAME]) self.date = datetime.today().strftime("%Y-%m-%d") - self.origin = parameters_dictionary["description"]["origin"] - self.performed = parameters_dictionary["data"]["performed_by"] + self.origin = parameters_dictionary_description[constants.ORG_PARAM_DESC_ORIGIN] + self.performed = parameters_dictionary_data[constants.ORG_PARAM_DATA_PERFORMED_BY] - if parameters_dictionary["data"]["genome_version"] == "": + if parameters_dictionary_data[constants.ORG_PARAM_DATA_GENOME_VERSION] == "": self.genome_version = "1.0" else: - self.genome_version = parameters_dictionary["data"]["genome_version"] - if parameters_dictionary["data"]["ogs_version"] == "": + self.genome_version = str(parameters_dictionary_data[constants.ORG_PARAM_DATA_GENOME_VERSION]) + if parameters_dictionary_data[constants.ORG_PARAM_DATA_OGS_VERSION] == "": self.ogs_version = "1.0" else: - self.ogs_version = parameters_dictionary["data"]["ogs_version"] + self.ogs_version = str(parameters_dictionary_data[constants.ORG_PARAM_DATA_OGS_VERSION]) # TODO: catch blocks if key is absent in input - self.genome_path = parameters_dictionary["data"]["genome_path"] - self.transcripts_path = parameters_dictionary["data"]["transcripts_path"] - self.proteins_path = parameters_dictionary["data"]["proteins_path"] - self.gff_path = parameters_dictionary["data"]["gff_path"] - self.interpro_path = parameters_dictionary["data"]["interpro_path"] - self.blastp_path = parameters_dictionary["data"]["blastp_path"] - self.blastx_path = parameters_dictionary["data"]["blastx_path"] - self.orthofinder_path = parameters_dictionary["data"]["orthofinder_path"] + self.genome_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_GENOME_PATH] + self.transcripts_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_TRANSCRIPTS_PATH] + self.proteins_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_PROTEINS_PATH] + self.gff_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_GFF_PATH] + self.interpro_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_INTERPRO_PATH] + self.blastp_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_BLASTP_PATH] + self.blastx_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_BLASTX_PATH] + self.orthofinder_path = parameters_dictionary_data[constants.ORG_PARAM_DATA_ORTHOFINDER_PATH] + + if(constants.ORG_PARAM_SERVICES_BLAST in parameters_dictionary_services.keys()): + self.blast = parameters_dictionary_services[constants.ORG_PARAM_SERVICES_BLAST] + else: + self.blast = "0" self.genus_lowercase = self.genus[0].lower() + self.genus[1:] self.genus_uppercase = self.genus[0].upper() + self.genus[1:] self.chado_species_name = "{0} {1}".format(self.species, self.sex) self.full_name = ' '.join(utilities.filter_empty_not_empty_items([self.genus_uppercase, self.species, self.strain, self.sex])["not_empty"]) - self.full_name = self.full_name.replace("__", "_").replace("_ ", "_").replace(" _", "_") - if self.full_name.endswith("_") or self.full_name.endswith(" "): - self.full_name = self.full_name[0:-2] self.full_name_lowercase = self.full_name.lower() self.abbreviation = "_".join(utilities.filter_empty_not_empty_items([self.genus_lowercase[0], self.species, self.strain, self.sex])["not_empty"]) - self.genus_species = self.genus_lowercase + "_" + self.species + self.genus_species = "{0}_{1}".format(self.genus.lower(), self.species.lower()) + self.dataset_prefix = None + if self.sex is not None or self.sex != "": + self.dataset_prefix = self.genus[0].lower() + "_" + self.species.lower() + "_" + self.sex[0].lower() + else: + self.dataset_prefix = self.genus[0].lower() + "_" + self.species.lower() + + # Bioblend/Chado IDs for an organism analyses/organisms/datasets/history/library + self.org_id = None + self.genome_analysis_id = None + self.ogs_analysis_id = None self.instance_url = None self.instance = None self.history_id = None self.library = None self.library_id = None + self.script_dir = os.path.dirname(os.path.realpath(sys.argv[0])) self.main_dir = None self.species_dir = None - self.org_id = None - self.genome_analysis_id = None - self.ogs_analysis_id = None + self.tool_panel = None self.datasets = dict() self.datasets_name = dict() @@ -79,12 +126,6 @@ class SpeciesData: self.api_key = None # API key used to communicate with the galaxy instance. Cannot be used to do user-tied actions self.datasets = dict() self.config = None # Custom config used to set environment variables inside containers - self.species_folder_name = "_".join(utilities.filter_empty_not_empty_items([self.genus_lowercase.lower(), self.species.lower(), self.strain.lower(), self.sex.lower()])["not_empty"]) - self.species_folder_name = self.species_folder_name .replace("-", "_") - self.existing_folders_cache = {} - self.bam_metadata_cache = {} - - # # Sanitize str attributes - # for var in vars(self): - # for attr in var if type(attr) == str: - # attr = attr.replace("(", "_").replace(")", "_") + self.species_folder_name = "_".join(utilities.filter_empty_not_empty_items( + [self.genus_lowercase.lower(), self.species.lower(), self.strain.lower(), + self.sex.lower()])["not_empty"]) diff --git a/templates/gspecies_compose_template.yml.j2 b/templates/gspecies_compose.yml.j2 similarity index 79% rename from templates/gspecies_compose_template.yml.j2 rename to templates/gspecies_compose.yml.j2 index d897fa2d99d52a30ca9d2dd20db081612b80900f..b1f4c6f9fe5cf6ab245be4edb8f42aba7abad9f1 100644 --- a/templates/gspecies_compose_template.yml.j2 +++ b/templates/gspecies_compose.yml.j2 @@ -20,7 +20,7 @@ services: - "traefik.http.routers.{{ genus_species }}-nginx.middlewares=sp-auth,sp-app-trailslash,sp-prefix" {% else %} - "traefik.http.routers.{{ genus_species }}-nginx.entryPoints=web" - - "traefik.http.routers.{{ genus_species }}-nginx.middlewares=sp-app-trailslash,sp-prefix" # lg + - "traefik.http.routers.{{ genus_species }}-nginx.middlewares=sp-app-trailslash,sp-prefix" {% endif %} - "traefik.http.services.{{ genus_species }}-nginx.loadbalancer.server.port=80" restart_policy: @@ -38,7 +38,7 @@ services: - ./docker_data/galaxy/:/export/:ro - ./src_data/:/project_data/:ro - ./src_data:/data:ro - {% if 'banner' in render_vars %} + {% if 'tripal_banner_path' is defined %} - ./banner.png:/var/www/html/banner.png:ro {% endif %} #- /groups/XXX/:/groups/XXX/:ro # We do this when we have symlinks in src_data pointing to /groups/XXX/... @@ -53,24 +53,10 @@ services: SITE_NAME: "{{ Genus_species }}" ELASTICSEARCH_HOST: elasticsearch.{{ genus_species }} ENABLE_JBROWSE: /jbrowse/?data=data/{{ genus_species_strain_sex }} - # This ENABLE_JBROWSE variable should point to the "best assembly" by default --> tag it in the input file and use it to define this variable correctly (also called - # unique id in the jbrowse tool parameters == both have to be identical) - {% if apollo == True %} - ENABLE_APOLLO: 1 - {% else %} ENABLE_APOLLO: 0 - {% endif %} - {% if blast == True %} - ENABLE_BLAST: 1 - {% else %} - ENABLE_BLAST: 0 - {% endif %} + ENABLE_BLAST: {{ blast }} ENABLE_DOWNLOAD: 1 - {% if wiki == True %} - ENABLE_WIKI: 1 - {% else %} ENABLE_WIKI: 0 - {% endif %} ENABLE_GO: 0 ENABLE_ORTHOLOGY: 0 ENABLE_ORTHOLOGY_LINKS: 0 @@ -89,7 +75,7 @@ services: - "traefik.http.routers.{{ genus_species }}-tripal.middlewares=sp-auth,sp-trailslash,sp-prefix,tripal-addprefix" {% else %} - "traefik.http.routers.{{ genus_species }}-tripal.entryPoints=web" - - "traefik.http.routers.{{ genus_species }}-tripal.middlewares=sp-trailslash,sp-prefix,tripal-addprefix" # lg + - "traefik.http.routers.{{ genus_species }}-tripal.middlewares=sp-trailslash,sp-prefix,tripal-addprefix" {% endif %} - "traefik.http.services.{{ genus_species }}-tripal.loadbalancer.server.port=80" restart_policy: @@ -112,14 +98,9 @@ services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1 - #deploy: - #resources: - #limits: - #memory: 500M volumes: - ./docker_data/elastic_search_index/:/usr/share/elasticsearch/data/ environment: - # bootstrap.memory_lock: "true" xpack.security.enabled: "false" xpack.monitoring.enabled: "false" xpack.ml.enabled: "false" @@ -134,15 +115,11 @@ services: galaxy: image: quay.io/galaxy-genome-annotation/docker-galaxy-annotation:gmod volumes: - {% if persist_galaxy_data is defined %} - {% if persist_galaxy_data == "False" %} + {% if (galaxy_persist_data is defined) and (galaxy_persist_data == "False") %} #- ./docker_data/galaxy/:/export/ {% else %} - ./docker_data/galaxy/:/export/ {% endif %} - {% else %} - - ./docker_data/galaxy/:/export/ - {% endif %} - ./src_data/:/project_data/:ro #- /groups/XXX/:/groups/XXX/:ro # We do this when we have symlinks in src_data pointing to /groups/XXX/... - ./docker_data/jbrowse/:/jbrowse/data/ @@ -158,15 +135,11 @@ services: GALAXY_DEFAULT_ADMIN_EMAIL: "{{ galaxy_default_admin_email }}" GALAXY_DEFAULT_ADMIN_USER: "{{ galaxy_defaut_admin_user }}" GALAXY_DEFAULT_ADMIN_PASSWORD: "{{ galaxy_default_admin_password }}" - GALAXY_CONFIG_ADMIN_USERS: "admin@galaxy.org,{{ galaxy_default_admin_email }}" # admin@galaxy.org is the default (leave it), gogepp@bipaa is a shared ldap user we use to connect + GALAXY_CONFIG_ADMIN_USERS: "admin@galaxy.org,{{ galaxy_default_admin_email }}" # admin@galaxy.org is the default (leave it), galaxy_default_admin_email is a shared ldap user we use to connect ENABLE_FIX_PERMS: 0 PROXY_PREFIX: /sp/{{ genus_species }}/galaxy GALAXY_TRIPAL_URL: http://tripal.{{ genus_species }}/tripal/ GALAXY_TRIPAL_PASSWORD: {{ tripal_password }} # See tripal config above - GALAXY_WEBAPOLLO_URL: http://one-of-the-swarm-node:8888/apollo/ - GALAXY_WEBAPOLLO_USER: "{{ webapollo_user }}" - GALAXY_WEBAPOLLO_PASSWORD: "{{ webapollo_password }}" # See tripal config below - GALAXY_WEBAPOLLO_EXT_URL: /apollo/ GALAXY_CHADO_DBHOST: tripal-db.{{ genus_species }} GALAXY_CHADO_DBSCHEMA: chado GALAXY_AUTO_UPDATE_DB: 1 @@ -186,12 +159,12 @@ services: - "traefik.http.routers.{{ genus_species }}-galaxy.middlewares=sp-auth,sp-app-trailslash,sp-app-prefix" {% else %} - "traefik.http.routers.{{ genus_species }}-galaxy.entryPoints=web" - - "traefik.http.routers.{{ genus_species }}-galaxy.middlewares=sp-app-trailslash,sp-app-prefix" #lg + - "traefik.http.routers.{{ genus_species }}-galaxy.middlewares=sp-app-trailslash,sp-app-prefix" {% endif %} - "traefik.http.services.{{ genus_species }}-galaxy.loadbalancer.server.port=80" - "traefik.http.routers.{{ genus_species }}-gga_load-galaxy.rule=(Host(`localhost`) && PathPrefix(`/sp/{{ genus_species }}/galaxy`))" - "traefik.http.routers.{{ genus_species }}-gga_load-galaxy.entryPoints=web" - + - "traefik.http.routers.{{ genus_species }}-gga_load-galaxy.middlewares=sp-app-trailslash,sp-app-prefix" restart_policy: condition: on-failure delay: 5s @@ -217,7 +190,7 @@ services: - "traefik.http.routers.{{ genus_species }}-jbrowse.middlewares=sp-auth,sp-app-trailslash,sp-app-prefix" {% else %} - "traefik.http.routers.{{ genus_species }}-jbrowse.entryPoints=web" - - "traefik.http.routers.{{ genus_species }}-jbrowse.middlewares=sp-app-trailslash,sp-app-prefix" #lg + - "traefik.http.routers.{{ genus_species }}-jbrowse.middlewares=sp-app-trailslash,sp-app-prefix" {% endif %} - "traefik.http.services.{{ genus_species }}-jbrowse.loadbalancer.server.port=80" restart_policy: @@ -226,7 +199,7 @@ services: max_attempts: 3 window: 120s - {% if blast == True %} + {% if blast is defined and blast == 1 %} blast: image: quay.io/abretaud/sf-blast:latest depends_on: @@ -236,7 +209,7 @@ services: UPLOAD_LIMIT: 20M MEMORY_LIMIT: 128M DB_NAME: 'postgres' - ADMIN_EMAIL: 'g.ga@sb-roscoff.fr' # email sender + ADMIN_EMAIL: 'g.ga@domain.org' # email sender ADMIN_NAME: 'gga' # email sender name JOBS_METHOD: 'local' # Can be local (= no sge jobs, but run inside the container) or drmaa (= to submit to a cluster) JOBS_WORK_DIR: '/tmp/blast_jobs/' # disk accessible both from compute nodes and mounted in this docker (at the same path) @@ -252,8 +225,8 @@ services: #JOBS_DRMAA_NATIVE: '-p web' # This line and following for slurm #DRMAA_METHOD: 'slurm' # This line and following for slurm volumes: - - ../blast-themes/abims/:/var/www/blast/app/Resources/:ro # You can theme the app - - /usr/local/genome2/:/usr/local/genome2/:ro # path for blast executables + - ../blast-themes/my_theme/:/var/www/blast/app/Resources/:ro # You can theme the app + - /path/to/blast/exe/:/path/to/blast/exe/:ro # path for blast executables - /db/:/db/:ro # for access to indexed blast databases #- /data1/sge/:/usr/local/sge/:ro # an sge install #- /xxxx/blast_jobs/:/xxxx/blast_jobs/ # (for drmaa mode only) @@ -277,7 +250,7 @@ services: - "traefik.http.routers.{{ genus_species }}-blast.middlewares=sp-big-req,sp-auth,sp-app-trailslash,sp-app-prefix" {% else %} - "traefik.http.routers.{{ genus_species }}-blast.entryPoints=web" - - "traefik.http.routers.{{ genus_species }}-blast.middlewares=sp-big-req,sp-app-trailslash,sp-app-prefix" # lg + - "traefik.http.routers.{{ genus_species }}-blast.middlewares=sp-big-req,sp-app-trailslash,sp-app-prefix" {% endif %} - "traefik.http.services.{{ genus_species }}-blast.loadbalancer.server.port=80" restart_policy: @@ -297,49 +270,9 @@ services: - {{ genus_species }} {% endif %} - {% if wiki == True %} - wiki: - image: quay.io/abretaud/mediawiki - environment: - MEDIAWIKI_SERVER: http://localhost - MEDIAWIKI_PROXY_PREFIX: /sp/{{ genus_species }}/wiki - MEDIAWIKI_SITENAME: {{ Genus }} {{ species }} - MEDIAWIKI_SECRET_KEY: XXXXXXXXXX - MEDIAWIKI_DB_HOST: wiki-db.{{genus_species }} - MEDIAWIKI_DB_PASSWORD: password - MEDIAWIKI_ADMIN_USER: abretaud # ldap user - depends_on: - - wiki-db - volumes: - - ./docker_data/wiki_uploads:/images - #- ../bipaa_wiki.png:/var/www/mediawiki/resources/assets/wiki.png:ro # To change the logo at the top left - networks: - - traefikbig - - {{ genus_species }} - deploy: - labels: - - "traefik.http.routers.{{ genus_species }}-wiki.rule=(Host(`{{ hostname }}`) && PathPrefix(`/sp/{{ genus_species }}/wiki`))" - - "traefik.http.routers.{{ genus_species }}-wiki.tls=true" - - "traefik.http.routers.{{ genus_species }}-wiki.entryPoints={{ entrypoint }}" - - "traefik.http.routers.{{ genus_species }}-wiki.middlewares=sp-big-req,sp-auth,sp-app-trailslash,sp-app-prefix" - - "traefik.http.services.{{ genus_species }}-wiki.loadbalancer.server.port=80" - restart_policy: - condition: on-failure - delay: 5s - max_attempts: 3 - window: 120s - - wiki-db: - image: postgres:9.6-alpine - volumes: - - ./docker_data/wiki_db/:/var/lib/postgresql/data/ - networks: - - {{ genus_species }} - {% endif %} - networks: traefikbig: external: true {{ genus_species }}: driver: overlay - name: {{ genus_species }} + name: {{ genus_species }} \ No newline at end of file diff --git a/templates/organisms.yml.j2 b/templates/organisms.yml.j2 new file mode 100644 index 0000000000000000000000000000000000000000..34f3c01293984e585f7fdf7cc5882844e5f28cd3 --- /dev/null +++ b/templates/organisms.yml.j2 @@ -0,0 +1,25 @@ +- {{ org_param_name }}: {{ org_param_name_value }} + {{ org_param_desc }}: + {{ org_param_desc_genus }}: {{ org_param_desc_genus_value }} + {{ org_param_desc_species }}: {{ org_param_desc_species_value }} + {{ org_param_desc_sex }}: {{ org_param_desc_sex_value }} + {{ org_param_desc_strain }}: {{ org_param_desc_strain_value }} + {{ org_param_desc_common_name }}: {{ org_param_desc_common_name_value }} + {{ org_param_desc_origin }}: {{ org_param_desc_origin_value }} + {% if org_param_desc_main_species_value is defined and org_param_desc_main_species_value is sameas true %} + {{ org_param_desc_main_species }}: yes + {% endif %} + {{ org_param_data }}: + {{ org_param_data_genome_path }}: {{ org_param_data_genome_path_value }} + {{ org_param_data_transcripts_path }}: {{ org_param_data_transcripts_path_value }} + {{ org_param_data_proteins_path }}: {{ org_param_data_proteins_path_value }} + {{ org_param_data_gff_path }}: {{ org_param_data_gff_path_value }} + {{ org_param_data_interpro_path }}: {{ org_param_data_interpro_path_value }} + {{ org_param_data_orthofinder_path }}: {{ org_param_data_orthofinder_path_value }} + {{ org_param_data_blastp_path }}: {{ org_param_data_blastp_path_value }} + {{ org_param_data_blastx_path }}: {{ org_param_data_blastx_path_value }} + {{ org_param_data_genome_version }}: {{ org_param_data_genome_version_value }} + {{ org_param_data_ogs_version }}: {{ org_param_data_ogs_version_value }} + {{ org_param_data_performed_by }}: {{ org_param_data_performed_by_value }} + {{ org_param_services }}: + {{ org_param_services_blast }}: {{ org_param_services_blast_value }} \ No newline at end of file diff --git a/templates/orthology_compose_template.yml.j2 b/templates/orthology_compose.yml.j2 similarity index 100% rename from templates/orthology_compose_template.yml.j2 rename to templates/orthology_compose.yml.j2 diff --git a/templates/traefik_compose_template.yml.j2 b/templates/traefik_compose.yml.j2 similarity index 81% rename from templates/traefik_compose_template.yml.j2 rename to templates/traefik_compose.yml.j2 index 6157cc2acb31627e05c6b9105370980ea5b6ab3d..0b4634f62a801a354a9c4e9567c22b7220928c32 100644 --- a/templates/traefik_compose_template.yml.j2 +++ b/templates/traefik_compose.yml.j2 @@ -5,19 +5,15 @@ services: command: - "--api" - "--api.dashboard" -# - "--api.insecure=true" # added by lg to debug, for dashboard - "--log.level=DEBUG" - "--providers.docker" - "--providers.docker.swarmMode=true" - - "--providers.docker.network=traefikbig" # changed by lg from traefik to traefikbig + - "--providers.docker.network=traefikbig" - "--entryPoints.web.address=:80" - - "--entryPoints.web.forwardedHeaders.trustedIPs={{ proxy_ip }}" # The ips of our upstream proxies: eci + - "--entryPoints.web.forwardedHeaders.trustedIPs={{ proxy_ip }}" # The ips of our upstream proxies - "--entryPoints.webs.address=:443" - - "--entryPoints.webs.forwardedHeaders.trustedIPs={{ proxy_ip }}" # The ips of our upstream proxies: eci + - "--entryPoints.webs.forwardedHeaders.trustedIPs={{ proxy_ip }}" # The ips of our upstream proxies ports: - {% if dashboard_port is defined %} - - {{ dashboard_port }}:8080 # added by lg to debug, for dashboard - {% endif %} - {{ http_port }}:80 {% if https_port is defined %} - {{ https_port }}:443 @@ -31,37 +27,39 @@ services: constraints: - node.role == manager labels: -# - "traefik.http.routers.traefik-api.rule=PathPrefix(`/traefik`)" - - "traefik.http.routers.traefik-api.rule=PathPrefix(`/api`) || PathPrefix(`/dashboard`) || PathPrefix(`/traefik`)" # lg + - "traefik.http.routers.traefik-api.rule=PathPrefix(`/api`) || PathPrefix(`/dashboard`) || PathPrefix(`/traefik`)" {% if https_port is defined %} - "traefik.http.routers.traefik-api.tls=true" - "traefik.http.routers.traefik-api.entryPoints=webs" {% else %} - - "traefik.http.routers.traefik-api.entryPoints=web" # lg + - "traefik.http.routers.traefik-api.entryPoints=web" {% endif %} - "traefik.http.routers.traefik-api.service=api@internal" - "traefik.http.middlewares.traefik-strip.stripprefix.prefixes=/traefik" - - "traefik.http.middlewares.traefik-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://auth.abims-gga.sb-roscoff.fr/" + - "traefik.http.middlewares.traefik-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://{{ authentication_domain_name }}/" - "traefik.http.middlewares.traefik-auth.forwardauth.trustForwardHeader=true" -# - "traefik.http.routers.traefik-api.middlewares=traefik-auth,traefik-strip" - - "traefik.http.routers.traefik-api.middlewares=traefik-strip" # lg + {% if https_port is defined %} + - "traefik.http.routers.traefik-api.middlewares=traefik-auth,traefik-strip" + {% else %} + - "traefik.http.routers.traefik-api.middlewares=traefik-strip" + {% endif %} # Dummy service for Swarm port detection. The port can be any valid integer value. - "traefik.http.services.traefik-svc.loadbalancer.server.port=9999" # Some generally useful middlewares for organisms hosting - - "traefik.http.middlewares.sp-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://auth.abims-gga.sb-roscoff.fr/" + - "traefik.http.middlewares.sp-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://{{ authentication_domain_name }}/" - "traefik.http.middlewares.sp-auth.forwardauth.trustForwardHeader=true" - "traefik.http.middlewares.sp-auth.forwardauth.authResponseHeaders=Remote-User,Remote-Groups" {% if https_port is defined %} - "traefik.http.middlewares.sp-trailslash.redirectregex.regex=^(https?://[^/]+/sp/[^/]+)$$" {% else %} - - "traefik.http.middlewares.sp-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+)$$" # lg + - "traefik.http.middlewares.sp-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+)$$" {% endif %} - "traefik.http.middlewares.sp-trailslash.redirectregex.replacement=$${1}/" - "traefik.http.middlewares.sp-trailslash.redirectregex.permanent=true" {% if https_port is defined %} - "traefik.http.middlewares.sp-app-trailslash.redirectregex.regex=^(https?://[^/]+/sp/[^/]+/[^/]+)$$" {% else %} - - "traefik.http.middlewares.sp-app-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+/[^/]+)$$" # lg + - "traefik.http.middlewares.sp-app-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+/[^/]+)$$" {% endif %} - "traefik.http.middlewares.sp-app-trailslash.redirectregex.replacement=$${1}/" - "traefik.http.middlewares.sp-app-trailslash.redirectregex.permanent=true" @@ -86,6 +84,10 @@ services: - authelia-db volumes: - ./authelia/:/etc/authelia/:ro + {% if authelia_secrets_env_path is defined %} + env_file: + - {{authelia_secrets_env_path}} + {% endif %} deploy: labels: - "traefik.http.routers.authelia.rule=Host(`{{ authentication_domain_name }}`)" @@ -113,7 +115,7 @@ services: authelia-db: image: postgres:12.2-alpine environment: - POSTGRES_PASSWORD: z3A,hQ-9 + POSTGRES_PASSWORD: {{ authelia_db_postgres_password }} volumes: - ./docker_data/authelia_db/:/var/lib/postgresql/data/ networks: @@ -132,4 +134,4 @@ networks: name: traefikbig ipam: config: - - subnet: 10.50.0.0/16 + - subnet: 10.50.0.0/16 \ No newline at end of file diff --git a/utilities.py b/utilities.py index d6b8504c65a98749c311de30847f793a6c574d43..3d734f24a9c9bdd82f60f6127a051602493b861b 100755 --- a/utilities.py +++ b/utilities.py @@ -6,7 +6,26 @@ import logging import sys import os import subprocess +import bioblend +import constants +def load_yaml(yaml_file): + + try: + with open(yaml_file, 'r') as stream: + try: + data = yaml.safe_load(stream) + except yaml.YAMLError as err: + logging.critical("Input file %s is not in YAML format" % yaml_file) + sys.exit(err) + except FileNotFoundError: + logging.critical("Input file doesn't exist (%s)" % yaml_file) + sys.exit() + except OSError: + logging.critical("Input file cannot be read (%s)" % yaml_file) + sys.exit() + + return data def parse_config(config_file): """ @@ -16,25 +35,14 @@ def parse_config(config_file): :return: """ - config_variables = {} - logging.debug("Using config: %s" % os.path.abspath(config_file)) - try: - with open(config_file, 'r') as stream: - yaml_dict = yaml.safe_load(stream) - for k, v in yaml_dict.items(): - for k2, v2 in v.items(): - config_variables[k2] = v2 # Add a key:value pair to variables for replacement in the compose template file - - except FileNotFoundError: - logging.critical("The config file specified doesn't exist (%s)" % config_file) - sys.exit() - except OSError: - logging.critical("The config file specified cannot be read (%s)" % config_file) + config_dict = load_yaml(config_file) + if isinstance(config_dict, dict): + #logging.debug("Config dictionary: {0}".format(config_dict)) + return config_dict + else: + logging.critical("Config yaml file is not a dictionary" % config_file) sys.exit() - return config_variables - - def parse_input(input_file): """ Parse the yml input file to extract data to create the SpeciesData objects @@ -44,27 +52,13 @@ def parse_input(input_file): :return: """ - parsed_sp_dict_list = [] - - try: - with open(input_file, 'r') as stream: - try: - yaml_dict = yaml.safe_load(stream) - for k, v in yaml_dict.items(): - parsed_sp_dict_list.append(v) - except yaml.YAMLError as err: - logging.critical("Input file is not in YAML format") - sys.exit(err) - except FileNotFoundError: - logging.critical("The specified input file doesn't exist (%s)" % input_file) - sys.exit() - except OSError: - logging.critical("The specified input file cannot be read (%s)" % input_file) + sp_dict_list = load_yaml(input_file) + if isinstance(sp_dict_list, list): + return sp_dict_list + else: + logging.critical("Input organisms yaml file is not a list" % input_file) sys.exit() - return parsed_sp_dict_list - - def filter_empty_not_empty_items(li): """ Separate a list between empty items and non empty items. @@ -123,6 +117,12 @@ def get_species_history_id(instance, full_name): return [history_id, show_history] +def get_gspecies_string_from_sp_dict(sp_dict): + + genus = sp_dict[constants.ORG_PARAM_DESC][constants.ORG_PARAM_DESC_GENUS] + species = sp_dict[constants.ORG_PARAM_DESC][constants.ORG_PARAM_DESC_SPECIES] + gspecies = genus.lower() + "_" + species.lower() + return gspecies def get_unique_species_str_list(sp_dict_list): """ @@ -137,15 +137,9 @@ def get_unique_species_str_list(sp_dict_list): unique_species_li = [] for sp in sp_dict_list: - for k, v in sp.items(): - sp_gspecies = "" - for k2, v2 in v.items(): - if k2 == "genus": - sp_gspecies = sp_gspecies.lower() + v2 - elif k2 == "species": - sp_gspecies = sp_gspecies.lower() + "_" + v2 - if sp_gspecies not in unique_species_li and sp_gspecies != "": - unique_species_li.append(sp_gspecies) + sp_gspecies = get_gspecies_string_from_sp_dict(sp) + if sp_gspecies not in unique_species_li and sp_gspecies != "": + unique_species_li.append(sp_gspecies) return unique_species_li @@ -162,16 +156,70 @@ def get_unique_species_dict_list(sp_dict_list): unique_species_dict = {} unique_species_list_of_dict = [] - unique_species_genus_species = get_unique_species_str_list(sp_dict_list=sp_dict_list) for sp in sp_dict_list: - for gspecies in unique_species_genus_species: - if gspecies not in unique_species_dict.keys(): - unique_species_dict[gspecies] = sp - else: - continue + gspecies = get_gspecies_string_from_sp_dict(sp) + if gspecies not in unique_species_dict.keys() or ( constants.ORG_PARAM_DESC_MAIN_SPECIES in sp[constants.ORG_PARAM_DESC].keys() and + sp[constants.ORG_PARAM_DESC][constants.ORG_PARAM_DESC_MAIN_SPECIES] == True ) : + unique_species_dict[gspecies] = sp + else: + continue for k, v in unique_species_dict.items(): unique_species_list_of_dict.append(v) return unique_species_list_of_dict + +def run_tool(instance, tool_id, history_id, tool_inputs): + + output_dict = None + try: + logging.debug("Running tool {0} with tool inputs: {1}".format(tool_id, tool_inputs)) + output_dict = instance.tools.run_tool( + tool_id=tool_id, + history_id=history_id, + tool_inputs=tool_inputs) + except bioblend.ConnectionError: + logging.error("Unexpected HTTP response (bioblend.ConnectionError) when running tool {0} with tool inputs: {1}".format(tool_id, tool_inputs)) + + return output_dict + +def run_tool_and_get_single_output_dataset_id(instance, tool_id, history_id, tool_inputs): + + output_dict = run_tool(instance, tool_id, history_id, tool_inputs) + single_output_dataset_id = output_dict["outputs"][0]["id"] + + return single_output_dataset_id + +def create_org_param_dict_from_constants(): + """ + Create a dictionary of variables containing the keys needed to render the organisms.yml.j2 (NOT the values) + Created from the constants + """ + + org_param_dict={} + org_param_dict["org_param_name"] = constants.ORG_PARAM_NAME + org_param_dict["org_param_desc"] = constants.ORG_PARAM_DESC + org_param_dict["org_param_desc_genus"] = constants.ORG_PARAM_DESC_GENUS + org_param_dict["org_param_desc_species"] = constants.ORG_PARAM_DESC_SPECIES + org_param_dict["org_param_desc_sex"] = constants.ORG_PARAM_DESC_SEX + org_param_dict["org_param_desc_strain"] = constants.ORG_PARAM_DESC_STRAIN + org_param_dict["org_param_desc_common_name"] = constants.ORG_PARAM_DESC_COMMON_NAME + org_param_dict["org_param_desc_origin"] = constants.ORG_PARAM_DESC_ORIGIN + org_param_dict["org_param_desc_main_species"] = constants.ORG_PARAM_DESC_MAIN_SPECIES + org_param_dict["org_param_data"] = constants.ORG_PARAM_DATA + org_param_dict["org_param_data_genome_path"] = constants.ORG_PARAM_DATA_GENOME_PATH + org_param_dict["org_param_data_transcripts_path"] = constants.ORG_PARAM_DATA_TRANSCRIPTS_PATH + org_param_dict["org_param_data_proteins_path"] = constants.ORG_PARAM_DATA_PROTEINS_PATH + org_param_dict["org_param_data_gff_path"] = constants.ORG_PARAM_DATA_GFF_PATH + org_param_dict["org_param_data_interpro_path"] = constants.ORG_PARAM_DATA_INTERPRO_PATH + org_param_dict["org_param_data_orthofinder_path"] = constants.ORG_PARAM_DATA_ORTHOFINDER_PATH + org_param_dict["org_param_data_blastp_path"] = constants.ORG_PARAM_DATA_BLASTP_PATH + org_param_dict["org_param_data_blastx_path"] = constants.ORG_PARAM_DATA_BLASTX_PATH + org_param_dict["org_param_data_genome_version"] = constants.ORG_PARAM_DATA_GENOME_VERSION + org_param_dict["org_param_data_ogs_version"] = constants.ORG_PARAM_DATA_OGS_VERSION + org_param_dict["org_param_data_performed_by"] = constants.ORG_PARAM_DATA_PERFORMED_BY + org_param_dict["org_param_services"] = constants.ORG_PARAM_SERVICES + org_param_dict["org_param_services_blast"] = constants.ORG_PARAM_SERVICES_BLAST + + return org_param_dict \ No newline at end of file diff --git a/workflows/Blast_Diamond.ga b/workflows_phaeoexplorer/Blast_Diamond.ga similarity index 100% rename from workflows/Blast_Diamond.ga rename to workflows_phaeoexplorer/Blast_Diamond.ga diff --git a/workflows/Chado_load_Tripal_synchronize.ga b/workflows_phaeoexplorer/Chado_load_Tripal_synchronize.ga similarity index 100% rename from workflows/Chado_load_Tripal_synchronize.ga rename to workflows_phaeoexplorer/Chado_load_Tripal_synchronize.ga diff --git a/workflows/Chado_load_Tripal_synchronize.ga.bak b/workflows_phaeoexplorer/Chado_load_Tripal_synchronize.ga.bak similarity index 100% rename from workflows/Chado_load_Tripal_synchronize.ga.bak rename to workflows_phaeoexplorer/Chado_load_Tripal_synchronize.ga.bak diff --git a/workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v1.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v1.ga similarity index 100% rename from workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v1.ga rename to workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v1.ga diff --git a/workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v2.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v2.ga similarity index 100% rename from workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v2.ga rename to workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v2.ga diff --git a/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v4.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v4.ga new file mode 100644 index 0000000000000000000000000000000000000000..0349a497fc1c8baa9ab98ba75c092dad191da08b --- /dev/null +++ b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_1org_v4.ga @@ -0,0 +1,535 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "", + "format-version": "0.1", + "name": "chado_load_tripal_synchronize_jbrowse_1org_v4", + "steps": { + "0": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "genome" + } + ], + "label": "genome", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 277.1999969482422, + "height": 61.19999694824219, + "left": 436.5, + "right": 636.5, + "top": 216, + "width": 200, + "x": 436.5, + "y": 216 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "fa9981ea-4012-40aa-ad84-6e6f61049104", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "65bba69d-b8f0-4f7e-a66a-71afa9a8975f" + } + ] + }, + "1": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 1, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "annotations" + } + ], + "label": "annotations", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 367.1999969482422, + "height": 61.19999694824219, + "left": 467.5, + "right": 667.5, + "top": 306, + "width": 200, + "x": 467.5, + "y": 306 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "61d23082-b459-4014-8584-6ff5b98ce689", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "acb766e7-fcd7-42e2-8fcf-638024338fc4" + } + ] + }, + "2": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 2, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "proteins" + } + ], + "label": "proteins", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 456.1999969482422, + "height": 61.19999694824219, + "left": 489.5, + "right": 689.5, + "top": 395, + "width": 200, + "x": 489.5, + "y": 395 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "ea25f583-f55d-4fdd-a7a9-86ffb4b9c731", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "eba4266d-d468-448f-8ce3-fa87b497cbf8" + } + ] + }, + "3": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "errors": null, + "id": 3, + "input_connections": { + "fasta": { + "id": 0, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load fasta", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "wait_for" + } + ], + "label": null, + "name": "Chado load fasta", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 356.1999969482422, + "height": 143.1999969482422, + "left": 766.5, + "right": 966.5, + "top": 213, + "width": 200, + "x": 766.5, + "y": 213 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "ba4d07fbaf47", + "name": "chado_feature_load_fasta", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"do_update\": \"false\", \"ext_db\": {\"db\": \"\", \"re_db_accession\": \"\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"match_on_name\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"re_name\": \"\", \"re_uniquename\": \"\", \"relationships\": {\"rel_type\": \"none\", \"__current_case__\": 0}, \"sequence_type\": \"contig\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "86b4962b-d001-44f3-b2f5-349e0daccc69", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "892534b5-0d67-44da-8892-f17da2be9e9c" + } + ] + }, + "4": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "errors": null, + "id": 4, + "input_connections": { + "reference_genome|genome": { + "id": 0, + "output_name": "output" + }, + "track_groups_0|data_tracks_0|data_format|annotation": { + "id": 1, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool JBrowse", + "name": "reference_genome" + } + ], + "label": null, + "name": "JBrowse", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 572, + "height": 184, + "left": 753.5, + "right": 953.5, + "top": 388, + "width": 200, + "x": 753.5, + "y": 388 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "tool_shed_repository": { + "changeset_revision": "4542035c1075", + "name": "jbrowse", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": \"true\", \"aboutDescription\": \"\", \"show_tracklist\": \"true\", \"show_nav\": \"true\", \"show_overview\": \"true\", \"show_menu\": \"true\", \"hideGenomeOptions\": \"false\"}, \"plugins\": {\"BlastView\": \"true\", \"ComboTrackSelector\": \"false\", \"GCContent\": \"false\"}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"RuntimeValue\"}}, \"standalone\": \"minimal\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Annotation\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"RuntimeValue\"}, \"match_part\": {\"match_part_select\": \"false\", \"__current_case__\": 1}, \"index\": \"false\", \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": \"\"}}, \"jbstyle\": {\"style_classname\": \"transcript\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": [{\"__index__\": 0, \"menu_action\": \"iframeDialog\", \"menu_label\": \"View transcript report\", \"menu_title\": \"Transcript {id}\", \"menu_url\": \"__MENU_URL_ORG__\", \"menu_icon\": \"dijitIconBookmark\"}]}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.16.11+galaxy0", + "type": "tool", + "uuid": "4e87e6b5-c37c-4429-a491-2d6a411d8a13", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "37d09194-4527-4428-963c-85cb351efcba" + } + ] + }, + "5": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "errors": null, + "id": 5, + "input_connections": { + "fasta": { + "id": 2, + "output_name": "output" + }, + "gff": { + "id": 1, + "output_name": "output" + }, + "wait_for": { + "id": 3, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load gff", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "gff" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "wait_for" + } + ], + "label": null, + "name": "Chado load gff", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 383.6000061035156, + "height": 173.60000610351562, + "left": 1043.5, + "right": 1243.5, + "top": 210, + "width": 200, + "x": 1043.5, + "y": 210 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "e9a6d7568817", + "name": "chado_feature_load_gff", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"add_only\": \"false\", \"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"gff\": {\"__class__\": \"RuntimeValue\"}, \"landmark_type\": \"contig\", \"no_seq_compute\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"prot_naming\": {\"method\": \"regex\", \"__current_case__\": 1, \"re_protein_capture\": \"^mRNA(_.+)$\", \"re_protein\": \"prot\\\\1\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "34eb5e35-988e-4b49-8f2c-b11761a43588", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "2c0c2e3d-7ce1-4f7e-ad2f-456d9e97fdb8" + } + ] + }, + "6": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "errors": null, + "id": 6, + "input_connections": { + "organisms_0|jbrowse": { + "id": 4, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "Add organisms to JBrowse container", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 551.1999969482422, + "height": 133.1999969482422, + "left": 1039.5, + "right": 1239.5, + "top": 418, + "width": 200, + "x": 1039.5, + "y": 418 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "tool_shed_repository": { + "changeset_revision": "11033bdad2ca", + "name": "jbrowse_to_container", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organisms\": [{\"__index__\": 0, \"jbrowse\": {\"__class__\": \"RuntimeValue\"}, \"name\": \"__DISPLAY_NAME_ORG__\", \"advanced\": {\"unique_id\": \"__UNIQUE_ID_ORG__\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "0.5.1", + "type": "tool", + "uuid": "f06c3ec7-936f-41be-9718-248b1d760d11", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "948e9774-03e5-43be-ad57-4b785372f78a" + } + ] + }, + "7": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "errors": null, + "id": 7, + "input_connections": { + "wait_for": { + "id": 5, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize features", + "name": "organism_id" + } + ], + "label": null, + "name": "Synchronize features", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 370.6000061035156, + "height": 153.60000610351562, + "left": 1325.5, + "right": 1525.5, + "top": 217, + "width": 200, + "x": 1325.5, + "y": 217 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "64e36c3f0dd6", + "name": "tripal_feature_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"repeat_ids\": [], \"repeat_types\": [{\"__index__\": 0, \"types\": \"mRNA\"}, {\"__index__\": 1, \"types\": \"polypeptide\"}], \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "38dd9a7d-46e1-48c6-8a5a-9c62a6860431", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "704c9382-8a3e-4b2b-8190-399a43e6455f" + } + ] + }, + "8": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "errors": null, + "id": 8, + "input_connections": { + "wait_for": { + "id": 7, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Populate materialized views", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 369.6000061035156, + "height": 153.60000610351562, + "left": 1611.5, + "right": 1811.5, + "top": 216, + "width": 200, + "x": 1611.5, + "y": 216 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "3c08f32a3dc1", + "name": "tripal_db_populate_mviews", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"mview\": \"\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "b34ddfee-c317-4c21-99a6-679bd640b1be", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "53919587-2586-4c0f-9ae6-082119727c97" + } + ] + }, + "9": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "errors": null, + "id": 9, + "input_connections": { + "wait_for": { + "id": 8, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Index Tripal data", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 328.8000030517578, + "height": 112.80000305175781, + "left": 1897.5, + "right": 2097.5, + "top": 216, + "width": 200, + "x": 1897.5, + "y": 216 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "tool_shed_repository": { + "changeset_revision": "d55a39f12dda", + "name": "tripal_db_index", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"expose\": {\"do_expose\": \"no\", \"__current_case__\": 0}, \"queues\": \"10\", \"table\": {\"mode\": \"website\", \"__current_case__\": 0}, \"tokenizer\": \"standard\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.1", + "type": "tool", + "uuid": "6f8cf6b5-82f2-40bf-80c0-aecf74bedd5a", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "65a75a90-c47f-4cb4-be74-87cd89d1988a" + } + ] + } + }, + "tags": [], + "uuid": "f8c6fa33-4ade-4251-a214-0ce77cdaac6e", + "version": 1 +} \ No newline at end of file diff --git a/workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v1.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v1.ga similarity index 100% rename from workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v1.ga rename to workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v1.ga diff --git a/workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v2.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v2.ga similarity index 100% rename from workflows/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v2.ga rename to workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v2.ga diff --git a/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v3.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v3.ga new file mode 100644 index 0000000000000000000000000000000000000000..aa88764d3aba81740d81e0415838e148da4a372c --- /dev/null +++ b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v3.ga @@ -0,0 +1,907 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "", + "format-version": "0.1", + "name": "chado_load_tripal_synchronize_jbrowse_2org_v2", + "steps": { + "0": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "genome org1" + } + ], + "label": "genome org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 204.8000030517578, + "height": 61.80000305175781, + "left": 215, + "right": 415, + "top": 143, + "width": 200, + "x": 215, + "y": 143 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "59f823c8-fd7c-441c-84b6-21c367ba12f6", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "95cc9784-0f9c-46df-9f62-4c342f6695e7" + } + ] + }, + "1": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 1, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "annotations org1" + } + ], + "label": "annotations org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 294.8000030517578, + "height": 61.80000305175781, + "left": 214, + "right": 414, + "top": 233, + "width": 200, + "x": 214, + "y": 233 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "c82b756f-2ef5-41d5-9e14-eb9a7ef942c7", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "b67afacf-2d88-49f6-af15-02500f0ddc90" + } + ] + }, + "2": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 2, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "proteins org1" + } + ], + "label": "proteins org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 389.8000030517578, + "height": 61.80000305175781, + "left": 215, + "right": 415, + "top": 328, + "width": 200, + "x": 215, + "y": 328 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "0a0ac416-68b8-4c48-980e-04c6230c721e", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "cc50348e-1372-406e-9b3e-d2cacfea81b2" + } + ] + }, + "3": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 3, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "genome org2" + } + ], + "label": "genome org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 479.8000030517578, + "height": 61.80000305175781, + "left": 214, + "right": 414, + "top": 418, + "width": 200, + "x": 214, + "y": 418 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "ddd5f0f8-9e14-42f0-a866-2971d02e5435", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "0f63af8f-7c12-4c7a-a491-4f55f65fff4e" + } + ] + }, + "4": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 4, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "annotations org2" + } + ], + "label": "annotations org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 573.8000030517578, + "height": 61.80000305175781, + "left": 216, + "right": 416, + "top": 512, + "width": 200, + "x": 216, + "y": 512 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "c7f3b553-5cd2-46e6-a792-e864ab0d2459", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "e5ec0e3f-c56a-4c4c-b905-26b7e85808d3" + } + ] + }, + "5": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 5, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "proteins org2" + } + ], + "label": "proteins org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 663.8000030517578, + "height": 61.80000305175781, + "left": 217, + "right": 417, + "top": 602, + "width": 200, + "x": 217, + "y": 602 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "b5b82a8a-f73c-4183-bc0a-6956e5ad9d0a", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "92d17a84-9137-4437-add9-5b3e551bb6b2" + } + ] + }, + "6": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "errors": null, + "id": 6, + "input_connections": { + "fasta": { + "id": 0, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load fasta", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "wait_for" + } + ], + "label": "Chado load fasta org", + "name": "Chado load fasta", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 307.3999938964844, + "height": 164.39999389648438, + "left": 501, + "right": 701, + "top": 143, + "width": 200, + "x": 501, + "y": 143 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "ba4d07fbaf47", + "name": "chado_feature_load_fasta", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"do_update\": \"false\", \"ext_db\": {\"db\": \"\", \"re_db_accession\": \"\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"match_on_name\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"re_name\": \"\", \"re_uniquename\": \"\", \"relationships\": {\"rel_type\": \"none\", \"__current_case__\": 0}, \"sequence_type\": \"contig\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "d15253d8-a673-4bfb-8051-e06d6017288e", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "c5b2da92-4326-46c1-94cb-e0301e2629f8" + } + ] + }, + "7": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.8+galaxy1", + "errors": null, + "id": 7, + "input_connections": { + "reference_genome|genome": { + "id": 0, + "output_name": "output" + }, + "track_groups_0|data_tracks_0|data_format|annotation": { + "id": 1, + "output_name": "output" + } + }, + "inputs": [], + "label": "JBrowse org1", + "name": "JBrowse", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 868.1999969482422, + "height": 205.1999969482422, + "left": 513, + "right": 713, + "top": 663, + "width": 200, + "x": 513, + "y": 663 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.8+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fd5dbf0f732e", + "name": "jbrowse", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": \"true\", \"aboutDescription\": \"\", \"show_tracklist\": \"true\", \"show_nav\": \"true\", \"show_overview\": \"true\", \"show_menu\": \"true\", \"hideGenomeOptions\": \"false\"}, \"plugins\": {\"BlastView\": \"true\", \"ComboTrackSelector\": \"false\", \"GCContent\": \"false\"}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"ConnectedValue\"}}, \"standalone\": \"false\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Annotation\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"ConnectedValue\"}, \"match_part\": {\"match_part_select\": \"false\", \"__current_case__\": 1}, \"index\": \"false\", \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": \"\"}}, \"jbstyle\": {\"style_classname\": \"transcript\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": [{\"__index__\": 0, \"menu_action\": \"iframeDialog\", \"menu_label\": \"View transcript report\", \"menu_title\": \"Transcript {id}\", \"menu_url\": \"__MENU_URL_ORG1__\", \"menu_icon\": \"dijitIconBookmark\"}]}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.16.8+galaxy1", + "type": "tool", + "uuid": "16f56647-e3c0-4bda-a126-9011cd3115b8", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "0a865401-1041-44b8-bd20-2c422bc695fa" + } + ] + }, + "8": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.8+galaxy1", + "errors": null, + "id": 8, + "input_connections": { + "reference_genome|genome": { + "id": 3, + "output_name": "output" + }, + "track_groups_0|data_tracks_0|data_format|annotation": { + "id": 4, + "output_name": "output" + } + }, + "inputs": [], + "label": "JBrowse org2", + "name": "JBrowse", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 1084.1999969482422, + "height": 205.1999969482422, + "left": 518, + "right": 718, + "top": 879, + "width": 200, + "x": 518, + "y": 879 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.8+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fd5dbf0f732e", + "name": "jbrowse", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": \"true\", \"aboutDescription\": \"\", \"show_tracklist\": \"true\", \"show_nav\": \"true\", \"show_overview\": \"true\", \"show_menu\": \"true\", \"hideGenomeOptions\": \"false\"}, \"plugins\": {\"BlastView\": \"true\", \"ComboTrackSelector\": \"false\", \"GCContent\": \"false\"}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"ConnectedValue\"}}, \"standalone\": \"false\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Annotation\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"ConnectedValue\"}, \"match_part\": {\"match_part_select\": \"false\", \"__current_case__\": 1}, \"index\": \"false\", \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": \"\"}}, \"jbstyle\": {\"style_classname\": \"transcript\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": [{\"__index__\": 0, \"menu_action\": \"iframeDialog\", \"menu_label\": \"View transcript report\", \"menu_title\": \"Transcript {id}\", \"menu_url\": \"__MENU_URL_ORG2__\", \"menu_icon\": \"dijitIconBookmark\"}]}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.16.8+galaxy1", + "type": "tool", + "uuid": "f4525181-f8b3-47ab-979f-fee0f46f13ba", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "4acd10c7-ef6d-4808-8aac-253d91733853" + } + ] + }, + "9": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "errors": null, + "id": 9, + "input_connections": { + "fasta": { + "id": 2, + "output_name": "output" + }, + "gff": { + "id": 1, + "output_name": "output" + }, + "wait_for": { + "id": 6, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load gff", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "gff" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "wait_for" + } + ], + "label": "Chado load gff ", + "name": "Chado load gff", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 317.3999938964844, + "height": 174.39999389648438, + "left": 787, + "right": 987, + "top": 143, + "width": 200, + "x": 787, + "y": 143 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "e9a6d7568817", + "name": "chado_feature_load_gff", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"add_only\": \"false\", \"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"gff\": {\"__class__\": \"RuntimeValue\"}, \"landmark_type\": \"contig\", \"no_seq_compute\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"prot_naming\": {\"method\": \"regex\", \"__current_case__\": 1, \"re_protein_capture\": \"^mRNA(_.+)$\", \"re_protein\": \"prot\\\\1\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "281d98df-9a07-4beb-b3e8-a1ac72d8c0c9", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "e92a76f7-7b7e-429c-9aa7-cfc8ce5f0cd1" + } + ] + }, + "10": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "errors": null, + "id": 10, + "input_connections": { + "organisms_0|jbrowse": { + "id": 7, + "output_name": "output" + }, + "organisms_1|jbrowse": { + "id": 8, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "Add organisms to JBrowse container", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 995.8000030517578, + "height": 184.8000030517578, + "left": 883, + "right": 1083, + "top": 811, + "width": 200, + "x": 883, + "y": 811 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "tool_shed_repository": { + "changeset_revision": "11033bdad2ca", + "name": "jbrowse_to_container", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organisms\": [{\"__index__\": 0, \"jbrowse\": {\"__class__\": \"ConnectedValue\"}, \"name\": \"__FULL_NAME_ORG1__\", \"advanced\": {\"unique_id\": \"__UNIQUE_ID_ORG1__\"}}, {\"__index__\": 1, \"jbrowse\": {\"__class__\": \"ConnectedValue\"}, \"name\": \"__FULL_NAME_ORG2__\", \"advanced\": {\"unique_id\": \"__UNIQUE_ID_ORG2__\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "0.5.1", + "type": "tool", + "uuid": "0f70a2d4-f599-4514-846d-16a558251a9a", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "0e8e64c8-7dc6-46f1-aca2-d4a6ffce638b" + } + ] + }, + "11": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "errors": null, + "id": 11, + "input_connections": { + "wait_for": { + "id": 9, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize features", + "name": "organism_id" + } + ], + "label": "Synchronize features org1", + "name": "Synchronize features", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 300.3999938964844, + "height": 154.39999389648438, + "left": 1069, + "right": 1269, + "top": 146, + "width": 200, + "x": 1069, + "y": 146 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "64e36c3f0dd6", + "name": "tripal_feature_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"repeat_ids\": [], \"repeat_types\": [{\"__index__\": 0, \"types\": \"mRNA\"}, {\"__index__\": 1, \"types\": \"polypeptide\"}], \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "7ed6c0a0-f36b-4a57-9fd8-4900af93b39c", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "ddd66d56-b81b-4ad7-a96d-baf365d8b85a" + } + ] + }, + "12": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "errors": null, + "id": 12, + "input_connections": { + "fasta": { + "id": 3, + "output_name": "output" + }, + "wait_for": { + "id": 11, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load fasta", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "wait_for" + } + ], + "label": "Chado load fasta org2", + "name": "Chado load fasta", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 562.3999938964844, + "height": 164.39999389648438, + "left": 514, + "right": 714, + "top": 398, + "width": 200, + "x": 514, + "y": 398 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "ba4d07fbaf47", + "name": "chado_feature_load_fasta", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"do_update\": \"false\", \"ext_db\": {\"db\": \"\", \"re_db_accession\": \"\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"match_on_name\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"re_name\": \"\", \"re_uniquename\": \"\", \"relationships\": {\"rel_type\": \"none\", \"__current_case__\": 0}, \"sequence_type\": \"contig\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "049f96ea-620a-434c-b32a-886cac9174ef", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "e67a8a18-2a9d-4906-8960-3d76455de221" + } + ] + }, + "13": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "errors": null, + "id": 13, + "input_connections": { + "fasta": { + "id": 5, + "output_name": "output" + }, + "gff": { + "id": 4, + "output_name": "output" + }, + "wait_for": { + "id": 12, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load gff", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "fasta" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "gff" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "wait_for" + } + ], + "label": "Chado load gff org2", + "name": "Chado load gff", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 594.8000030517578, + "height": 194.8000030517578, + "left": 799, + "right": 999, + "top": 400, + "width": 200, + "x": 799, + "y": 400 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "e9a6d7568817", + "name": "chado_feature_load_gff", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"add_only\": \"false\", \"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"fasta\": {\"__class__\": \"RuntimeValue\"}, \"gff\": {\"__class__\": \"RuntimeValue\"}, \"landmark_type\": \"contig\", \"no_seq_compute\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"prot_naming\": {\"method\": \"regex\", \"__current_case__\": 1, \"re_protein_capture\": \"^mRNA(_.+)$\", \"re_protein\": \"prot\\\\1\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "9293d4b6-7cad-4d6b-8143-e134f58c75fe", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "975db068-46e0-4c0a-a74f-80c7d2137ac8" + } + ] + }, + "14": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "errors": null, + "id": 14, + "input_connections": { + "wait_for": { + "id": 13, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize features", + "name": "organism_id" + } + ], + "label": "Synchronize features org2", + "name": "Synchronize features", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 560.3999938964844, + "height": 154.39999389648438, + "left": 1078, + "right": 1278, + "top": 406, + "width": 200, + "x": 1078, + "y": 406 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "64e36c3f0dd6", + "name": "tripal_feature_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"repeat_ids\": [], \"repeat_types\": [{\"__index__\": 0, \"types\": \"mRNA\"}, {\"__index__\": 1, \"types\": \"polypeptide\"}], \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "090a12c8-f931-4f3c-8853-9ae581c7c091", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "f51aa0a5-e2f8-41af-bb56-dbfe220e7fc8" + } + ] + }, + "15": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "errors": null, + "id": 15, + "input_connections": { + "wait_for": { + "id": 14, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Populate materialized views", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 456.3999938964844, + "height": 154.39999389648438, + "left": 1362, + "right": 1562, + "top": 302, + "width": 200, + "x": 1362, + "y": 302 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "3c08f32a3dc1", + "name": "tripal_db_populate_mviews", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"mview\": \"\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "0a3bd8c6-240b-45f6-a90d-e88ff3703fa7", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "c7229bc3-fbe7-4c7f-9a08-8ad631d57766" + } + ] + }, + "16": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "errors": null, + "id": 16, + "input_connections": { + "wait_for": { + "id": 15, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Index Tripal data", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 416.6000061035156, + "height": 113.60000610351562, + "left": 1638, + "right": 1838, + "top": 303, + "width": 200, + "x": 1638, + "y": 303 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "tool_shed_repository": { + "changeset_revision": "d55a39f12dda", + "name": "tripal_db_index", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"expose\": {\"do_expose\": \"no\", \"__current_case__\": 0}, \"queues\": \"10\", \"table\": {\"mode\": \"website\", \"__current_case__\": 0}, \"tokenizer\": \"standard\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.1", + "type": "tool", + "uuid": "b01a554e-a43a-494e-876c-ebe134c4c48f", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "0547bccc-451b-4de1-9452-83fe86d8bc2a" + } + ] + } + }, + "tags": [], + "uuid": "ef44b9b7-f9df-454e-bbba-070299d056d1", + "version": 1 +} \ No newline at end of file diff --git a/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v4.ga b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v4.ga new file mode 100644 index 0000000000000000000000000000000000000000..045e442219face7eed174cafcec206dd9de0ffa0 --- /dev/null +++ b/workflows_phaeoexplorer/Galaxy-Workflow-chado_load_tripal_synchronize_jbrowse_2org_v4.ga @@ -0,0 +1,881 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "", + "format-version": "0.1", + "name": "chado_load_tripal_synchronize_jbrowse_2org_v4", + "steps": { + "0": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "annotations org1" + } + ], + "label": "annotations org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 64.80000305175781, + "height": 61.80000305175781, + "left": 233, + "right": 433, + "top": 3, + "width": 200, + "x": 233, + "y": 3 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "c82b756f-2ef5-41d5-9e14-eb9a7ef942c7", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "cd6ee602-9669-4542-820c-c4655bc573b0" + } + ] + }, + "1": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 1, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "genome org1" + } + ], + "label": "genome org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": -25.199996948242188, + "height": 61.80000305175781, + "left": 234, + "right": 434, + "top": -87, + "width": 200, + "x": 234, + "y": -87 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "59f823c8-fd7c-441c-84b6-21c367ba12f6", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "6b57ff4d-58c7-4a22-a2c7-66acca4ebe33" + } + ] + }, + "2": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 2, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "proteins org1" + } + ], + "label": "proteins org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 159.8000030517578, + "height": 61.80000305175781, + "left": 234, + "right": 434, + "top": 98, + "width": 200, + "x": 234, + "y": 98 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "0a0ac416-68b8-4c48-980e-04c6230c721e", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "64cd0c64-5d21-400f-a3c8-3a035978be06" + } + ] + }, + "3": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 3, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "genome org2" + } + ], + "label": "genome org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 249.8000030517578, + "height": 61.80000305175781, + "left": 233, + "right": 433, + "top": 188, + "width": 200, + "x": 233, + "y": 188 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "ddd5f0f8-9e14-42f0-a866-2971d02e5435", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "bcf5bb6d-9796-4e37-bcf8-ac3d90efd7bf" + } + ] + }, + "4": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 4, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "annotations org2" + } + ], + "label": "annotations org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 343.8000030517578, + "height": 61.80000305175781, + "left": 235, + "right": 435, + "top": 282, + "width": 200, + "x": 235, + "y": 282 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "c7f3b553-5cd2-46e6-a792-e864ab0d2459", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "2de7390d-d075-4122-80ee-63217adf4f24" + } + ] + }, + "5": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 5, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "proteins org2" + } + ], + "label": "proteins org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 433.8000030517578, + "height": 61.80000305175781, + "left": 236, + "right": 436, + "top": 372, + "width": 200, + "x": 236, + "y": 372 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "b5b82a8a-f73c-4183-bc0a-6956e5ad9d0a", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "84bb8de5-0039-40bc-98b0-2524a8958443" + } + ] + }, + "6": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "errors": null, + "id": 6, + "input_connections": { + "fasta": { + "id": 1, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load fasta", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "organism" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "wait_for" + } + ], + "label": "Chado load fasta org", + "name": "Chado load fasta", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 77.39999389648438, + "height": 164.39999389648438, + "left": 519, + "right": 719, + "top": -87, + "width": 200, + "x": 519, + "y": -87 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "ba4d07fbaf47", + "name": "chado_feature_load_fasta", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"do_update\": \"false\", \"ext_db\": {\"db\": \"\", \"re_db_accession\": \"\"}, \"fasta\": {\"__class__\": \"ConnectedValue\"}, \"match_on_name\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"re_name\": \"\", \"re_uniquename\": \"\", \"relationships\": {\"rel_type\": \"none\", \"__current_case__\": 0}, \"sequence_type\": \"contig\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "d706dd73-5dc7-4c06-8c56-825aa81394a1", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "b058f83f-8d18-4ef4-b3a6-bae86dc0f9f6" + } + ] + }, + "7": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "errors": null, + "id": 7, + "input_connections": { + "reference_genome|genome": { + "id": 1, + "output_name": "output" + }, + "track_groups_0|data_tracks_0|data_format|annotation": { + "id": 0, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool JBrowse", + "name": "reference_genome" + } + ], + "label": "JBrowse org1", + "name": "JBrowse", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 638.1999969482422, + "height": 205.1999969482422, + "left": 532, + "right": 732, + "top": 433, + "width": 200, + "x": 532, + "y": 433 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "tool_shed_repository": { + "changeset_revision": "4542035c1075", + "name": "jbrowse", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": \"true\", \"aboutDescription\": \"\", \"show_tracklist\": \"true\", \"show_nav\": \"true\", \"show_overview\": \"true\", \"show_menu\": \"true\", \"hideGenomeOptions\": \"false\"}, \"plugins\": {\"BlastView\": \"true\", \"ComboTrackSelector\": \"false\", \"GCContent\": \"false\"}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"RuntimeValue\"}}, \"standalone\": \"complete\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Annotation\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"RuntimeValue\"}, \"match_part\": {\"match_part_select\": \"false\", \"__current_case__\": 1}, \"index\": \"false\", \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": \"\"}}, \"jbstyle\": {\"style_classname\": \"transcript\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": [{\"__index__\": 0, \"menu_action\": \"iframeDialog\", \"menu_label\": \"View transcript report\", \"menu_title\": \"Transcript {id}\", \"menu_url\": \"__MENU_URL_ORG1__\", \"menu_icon\": \"dijitIconBookmark\"}]}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.16.11+galaxy0", + "type": "tool", + "uuid": "ab159045-e72e-46cc-9e9f-d7ad2b50687e", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "e93e82c7-7a3b-4219-a2f6-09afbe60b1e0" + } + ] + }, + "8": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "errors": null, + "id": 8, + "input_connections": { + "reference_genome|genome": { + "id": 3, + "output_name": "output" + }, + "track_groups_0|data_tracks_0|data_format|annotation": { + "id": 4, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool JBrowse", + "name": "reference_genome" + } + ], + "label": "JBrowse org2", + "name": "JBrowse", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 854.1999969482422, + "height": 205.1999969482422, + "left": 537, + "right": 737, + "top": 649, + "width": 200, + "x": 537, + "y": 649 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy0", + "tool_shed_repository": { + "changeset_revision": "4542035c1075", + "name": "jbrowse", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": \"true\", \"aboutDescription\": \"\", \"show_tracklist\": \"true\", \"show_nav\": \"true\", \"show_overview\": \"true\", \"show_menu\": \"true\", \"hideGenomeOptions\": \"false\"}, \"plugins\": {\"BlastView\": \"true\", \"ComboTrackSelector\": \"false\", \"GCContent\": \"false\"}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"RuntimeValue\"}}, \"standalone\": \"complete\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Annotation\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"RuntimeValue\"}, \"match_part\": {\"match_part_select\": \"false\", \"__current_case__\": 1}, \"index\": \"false\", \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": \"\"}}, \"jbstyle\": {\"style_classname\": \"transcript\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": [{\"__index__\": 0, \"menu_action\": \"iframeDialog\", \"menu_label\": \"View transcript report\", \"menu_title\": \"Transcript {id}\", \"menu_url\": \"__MENU_URL_ORG2__\", \"menu_icon\": \"dijitIconBookmark\"}]}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.16.11+galaxy0", + "type": "tool", + "uuid": "a06b4426-5546-4393-aa6a-23a0c490eb8f", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "a44c76ce-c7f4-4f24-bc96-be90aafc9efb" + } + ] + }, + "9": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "errors": null, + "id": 9, + "input_connections": { + "fasta": { + "id": 2, + "output_name": "output" + }, + "gff": { + "id": 0, + "output_name": "output" + }, + "wait_for": { + "id": 6, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load gff", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "organism" + } + ], + "label": "Chado load gff ", + "name": "Chado load gff", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 87.39999389648438, + "height": 174.39999389648438, + "left": 806, + "right": 1006, + "top": -87, + "width": 200, + "x": 806, + "y": -87 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "e9a6d7568817", + "name": "chado_feature_load_gff", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"add_only\": \"false\", \"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"fasta\": {\"__class__\": \"ConnectedValue\"}, \"gff\": {\"__class__\": \"ConnectedValue\"}, \"landmark_type\": \"contig\", \"no_seq_compute\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"prot_naming\": {\"method\": \"regex\", \"__current_case__\": 1, \"re_protein_capture\": \"^mRNA(_.+)$\", \"re_protein\": \"prot\\\\1\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "f93c9145-6484-4c19-ab15-97f042fa4f1e", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "4b935868-ea0a-4ccf-9fe4-90d8fbe280df" + } + ] + }, + "10": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "errors": null, + "id": 10, + "input_connections": { + "organisms_0|jbrowse": { + "id": 7, + "output_name": "output" + }, + "organisms_1|jbrowse": { + "id": 8, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "Add organisms to JBrowse container", + "outputs": [ + { + "name": "output", + "type": "html" + } + ], + "position": { + "bottom": 765.8000030517578, + "height": 184.8000030517578, + "left": 902, + "right": 1102, + "top": 581, + "width": 200, + "x": 902, + "y": 581 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/jbrowse_to_container/jbrowse_to_container/0.5.1", + "tool_shed_repository": { + "changeset_revision": "11033bdad2ca", + "name": "jbrowse_to_container", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organisms\": [{\"__index__\": 0, \"jbrowse\": {\"__class__\": \"RuntimeValue\"}, \"name\": \"__DISPLAY_NAME_ORG1__\", \"advanced\": {\"unique_id\": \"__UNIQUE_ID_ORG1__\"}}, {\"__index__\": 1, \"jbrowse\": {\"__class__\": \"RuntimeValue\"}, \"name\": \"__DISPLAY_NAME_ORG2__\", \"advanced\": {\"unique_id\": \"__UNIQUE_ID_ORG2__\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "0.5.1", + "type": "tool", + "uuid": "2c90cac2-d1ef-4d3b-bf43-d01e27059823", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "91511518-e85c-4a29-a929-77334cad6a1c" + } + ] + }, + "11": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "errors": null, + "id": 11, + "input_connections": { + "wait_for": { + "id": 9, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize features", + "name": "organism_id" + } + ], + "label": "Synchronize features org1", + "name": "Synchronize features", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 70.39999389648438, + "height": 154.39999389648438, + "left": 1088, + "right": 1288, + "top": -84, + "width": 200, + "x": 1088, + "y": -84 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "64e36c3f0dd6", + "name": "tripal_feature_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"repeat_ids\": [], \"repeat_types\": [{\"__index__\": 0, \"types\": \"mRNA\"}, {\"__index__\": 1, \"types\": \"polypeptide\"}], \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "7ed6c0a0-f36b-4a57-9fd8-4900af93b39c", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "ef7589b7-bd80-48c5-bd44-f275d49ea324" + } + ] + }, + "12": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "errors": null, + "id": 12, + "input_connections": { + "fasta": { + "id": 3, + "output_name": "output" + }, + "wait_for": { + "id": 11, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load fasta", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load fasta", + "name": "organism" + } + ], + "label": "Chado load fasta org2", + "name": "Chado load fasta", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 332.3999938964844, + "height": 164.39999389648438, + "left": 533, + "right": 733, + "top": 168, + "width": 200, + "x": 533, + "y": 168 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_fasta/feature_load_fasta/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "ba4d07fbaf47", + "name": "chado_feature_load_fasta", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"do_update\": \"false\", \"ext_db\": {\"db\": \"\", \"re_db_accession\": \"\"}, \"fasta\": {\"__class__\": \"ConnectedValue\"}, \"match_on_name\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"re_name\": \"\", \"re_uniquename\": \"\", \"relationships\": {\"rel_type\": \"none\", \"__current_case__\": 0}, \"sequence_type\": \"contig\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "c860eb94-43cb-468e-ad5e-257a519b66ca", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "a742ac3c-b152-464e-bcfd-5139efd90599" + } + ] + }, + "13": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "errors": null, + "id": 13, + "input_connections": { + "fasta": { + "id": 5, + "output_name": "output" + }, + "gff": { + "id": 4, + "output_name": "output" + }, + "wait_for": { + "id": 12, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load gff", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load gff", + "name": "organism" + } + ], + "label": "Chado load gff org2", + "name": "Chado load gff", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 364.8000030517578, + "height": 194.8000030517578, + "left": 818, + "right": 1018, + "top": 170, + "width": 200, + "x": 818, + "y": 170 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_feature_load_gff/feature_load_gff/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "e9a6d7568817", + "name": "chado_feature_load_gff", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"add_only\": \"false\", \"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"fasta\": {\"__class__\": \"ConnectedValue\"}, \"gff\": {\"__class__\": \"ConnectedValue\"}, \"landmark_type\": \"contig\", \"no_seq_compute\": \"false\", \"organism\": {\"__class__\": \"RuntimeValue\"}, \"prot_naming\": {\"method\": \"regex\", \"__current_case__\": 1, \"re_protein_capture\": \"^mRNA(_.+)$\", \"re_protein\": \"prot\\\\1\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "a4d0333d-bcbc-44bb-88ff-7bba058a50d8", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "0d07b3da-3317-4d92-bb85-c301ee2d0753" + } + ] + }, + "14": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "errors": null, + "id": 14, + "input_connections": { + "wait_for": { + "id": 13, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize features", + "name": "organism_id" + } + ], + "label": "Synchronize features org2", + "name": "Synchronize features", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 330.3999938964844, + "height": 154.39999389648438, + "left": 1097, + "right": 1297, + "top": 176, + "width": 200, + "x": 1097, + "y": 176 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_feature_sync/feature_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "64e36c3f0dd6", + "name": "tripal_feature_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"repeat_ids\": [], \"repeat_types\": [{\"__index__\": 0, \"types\": \"mRNA\"}, {\"__index__\": 1, \"types\": \"polypeptide\"}], \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "090a12c8-f931-4f3c-8853-9ae581c7c091", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "b896a431-ea9b-42e7-a2e8-99c5c43a6069" + } + ] + }, + "15": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "errors": null, + "id": 15, + "input_connections": { + "wait_for": { + "id": 14, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Populate materialized views", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 226.39999389648438, + "height": 154.39999389648438, + "left": 1381, + "right": 1581, + "top": 72, + "width": 200, + "x": 1381, + "y": 72 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "3c08f32a3dc1", + "name": "tripal_db_populate_mviews", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"mview\": \"\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "0a3bd8c6-240b-45f6-a90d-e88ff3703fa7", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "ba904db6-a017-4090-8d13-e3076f0052c4" + } + ] + }, + "16": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "errors": null, + "id": 16, + "input_connections": { + "wait_for": { + "id": 15, + "output_name": "results" + } + }, + "inputs": [], + "label": null, + "name": "Index Tripal data", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 186.60000610351562, + "height": 113.60000610351562, + "left": 1657, + "right": 1857, + "top": 73, + "width": 200, + "x": 1657, + "y": 73 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "tool_shed_repository": { + "changeset_revision": "d55a39f12dda", + "name": "tripal_db_index", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"expose\": {\"do_expose\": \"no\", \"__current_case__\": 0}, \"queues\": \"10\", \"table\": {\"mode\": \"website\", \"__current_case__\": 0}, \"tokenizer\": \"standard\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.1", + "type": "tool", + "uuid": "b01a554e-a43a-494e-876c-ebe134c4c48f", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "f431f362-7ad4-484f-bce7-b5d290b782d6" + } + ] + } + }, + "tags": [], + "uuid": "43552d52-cef0-4601-8a5f-0a9e209c1426", + "version": 1 +} \ No newline at end of file diff --git a/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_1org_v1.ga b/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_1org_v1.ga new file mode 100644 index 0000000000000000000000000000000000000000..db4e9537e0094dd39932dc823f5fe789911d3fb6 --- /dev/null +++ b/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_1org_v1.ga @@ -0,0 +1,271 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "", + "format-version": "0.1", + "name": "load_blast_results_1org_v1", + "steps": { + "0": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "blast file xml org1" + } + ], + "label": "blast file xml org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 416.1999969482422, + "height": 82.19999694824219, + "left": 410, + "right": 610, + "top": 334, + "width": 200, + "x": 410, + "y": 334 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "b1c63d94-61a7-4bf1-8b5d-e08fb34c0357", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "f602d234-8cea-4db9-ab77-678cdc0d2101" + } + ] + }, + "1": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "errors": null, + "id": 1, + "input_connections": { + "input": { + "id": 0, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "input" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "organism_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "wait_for" + } + ], + "label": "load blast results org1", + "name": "Chado load Blast results", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 457.3999938964844, + "height": 164.39999389648438, + "left": 711, + "right": 911, + "top": 293, + "width": 200, + "x": 711, + "y": 293 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "08ae8b27b193", + "name": "chado_load_blast", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"blastdb_id\": \"21\", \"input\": {\"__class__\": \"RuntimeValue\"}, \"match_on_name\": \"false\", \"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"query_type\": \"polypeptide\", \"re_name\": \"\", \"skip_missing\": \"false\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "10144cf8-f121-45f3-ba64-9f4d66bf1e56", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "95708895-8439-4257-bff6-96e4c51a0725" + } + ] + }, + "2": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "errors": null, + "id": 2, + "input_connections": { + "wait_for": { + "id": 1, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize an analysis", + "name": "analysis_id" + } + ], + "label": "sync blast analysis org1", + "name": "Synchronize an analysis", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 451.3999938964844, + "height": 154.39999389648438, + "left": 1010, + "right": 1210, + "top": 297, + "width": 200, + "x": 1010, + "y": 297 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "f487ff676088", + "name": "tripal_analysis_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "99e7496d-ac32-467d-8c09-2efd48d0231a", + "workflow_outputs": [ + { + "label": "Synchronize Analysis into Tripal", + "output_name": "results", + "uuid": "1fb6db92-90a2-4e33-beec-f2f974e369e9" + } + ] + }, + "3": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "errors": null, + "id": 3, + "input_connections": { + "wait_for": { + "id": 2, + "output_name": "results" + } + }, + "inputs": [], + "label": "populate mat views", + "name": "Populate materialized views", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 452.3999938964844, + "height": 154.39999389648438, + "left": 1295, + "right": 1495, + "top": 298, + "width": 200, + "x": 1295, + "y": 298 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "3c08f32a3dc1", + "name": "tripal_db_populate_mviews", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"mview\": \"\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "5c947dd5-89df-4146-9ab8-9e1d6de42360", + "workflow_outputs": [ + { + "label": "Populate Tripal materialized view(s)", + "output_name": "results", + "uuid": "0a0c9fa7-3a3c-459d-b5c7-b7a5a11459f3" + } + ] + }, + "4": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "errors": null, + "id": 4, + "input_connections": { + "wait_for": { + "id": 3, + "output_name": "results" + } + }, + "inputs": [], + "label": "index tripal data", + "name": "Index Tripal data", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 433.6000061035156, + "height": 113.60000610351562, + "left": 1570, + "right": 1770, + "top": 320, + "width": 200, + "x": 1570, + "y": 320 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "tool_shed_repository": { + "changeset_revision": "d55a39f12dda", + "name": "tripal_db_index", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"expose\": {\"do_expose\": \"no\", \"__current_case__\": 0}, \"queues\": \"10\", \"table\": {\"mode\": \"website\", \"__current_case__\": 0}, \"tokenizer\": \"standard\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.1", + "type": "tool", + "uuid": "5ecc30b0-07ab-4f9b-81f0-31310358c221", + "workflow_outputs": [ + { + "label": "Index Tripal data", + "output_name": "results", + "uuid": "5c0f0431-acb0-4e40-a7e4-8a562933fd97" + } + ] + } + }, + "tags": [], + "uuid": "80e32784-e39e-48ce-a6e3-7627de734ca6", + "version": 4 +} \ No newline at end of file diff --git a/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_2org_v1.ga b/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_2org_v1.ga new file mode 100644 index 0000000000000000000000000000000000000000..ba2591c8dbd09e02b0ac52dcaf979709f2587bdb --- /dev/null +++ b/workflows_phaeoexplorer/Galaxy-Workflow-load_blast_results_2org_v1.ga @@ -0,0 +1,439 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "", + "format-version": "0.1", + "name": "load_blast_results_2org_v1", + "steps": { + "0": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "blast file xml org1" + } + ], + "label": "blast file xml org1", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 230.39999389648438, + "height": 61.19999694824219, + "left": 97.5, + "right": 297.5, + "top": 169.1999969482422, + "width": 200, + "x": 97.5, + "y": 169.1999969482422 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "b1c63d94-61a7-4bf1-8b5d-e08fb34c0357", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "72006174-6297-4777-95bd-ca427b9ea729" + } + ] + }, + "1": { + "annotation": "", + "content_id": null, + "errors": null, + "id": 1, + "input_connections": {}, + "inputs": [ + { + "description": "", + "name": "blast file xml org2" + } + ], + "label": "blast file xml org2", + "name": "Input dataset", + "outputs": [], + "position": { + "bottom": 341.40000915527344, + "height": 61.19999694824219, + "left": 129.5, + "right": 329.5, + "top": 280.20001220703125, + "width": 200, + "x": 129.5, + "y": 280.20001220703125 + }, + "tool_id": null, + "tool_state": "{\"optional\": false}", + "tool_version": null, + "type": "data_input", + "uuid": "9de2716c-eecd-48fc-8a71-b3d1f5daef85", + "workflow_outputs": [ + { + "label": null, + "output_name": "output", + "uuid": "45971e82-4e85-4993-a9cb-9a4608e9def7" + } + ] + }, + "2": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "errors": null, + "id": 2, + "input_connections": { + "input": { + "id": 0, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "organism_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "wait_for" + } + ], + "label": "load blast results org1", + "name": "Chado load Blast results", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 255.8000030517578, + "height": 163.60000610351562, + "left": 457.5, + "right": 657.5, + "top": 92.19999694824219, + "width": 200, + "x": 457.5, + "y": 92.19999694824219 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "08ae8b27b193", + "name": "chado_load_blast", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"blastdb_id\": \"21\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"match_on_name\": \"false\", \"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"query_type\": \"polypeptide\", \"re_name\": \"\", \"skip_missing\": \"false\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "595f6e1f-955a-42be-b03b-1269d1f7d189", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "cb238779-29f4-4f22-b6f3-6a8cc84857d1" + } + ] + }, + "3": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "errors": null, + "id": 3, + "input_connections": { + "wait_for": { + "id": 2, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize an analysis", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Synchronize an analysis", + "name": "wait_for" + } + ], + "label": "sync blast analysis org1", + "name": "Synchronize an analysis", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 254.8000030517578, + "height": 153.60000610351562, + "left": 787.5, + "right": 987.5, + "top": 101.19999694824219, + "width": 200, + "x": 787.5, + "y": 101.19999694824219 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "f487ff676088", + "name": "tripal_analysis_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "c98dedf6-8857-4d23-be94-fe6630f245d7", + "workflow_outputs": [ + { + "label": "Synchronize Analysis into Tripal", + "output_name": "results", + "uuid": "1ff4b1db-b6bf-4c48-a0ab-0a8513683999" + } + ] + }, + "4": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "errors": null, + "id": 4, + "input_connections": { + "input": { + "id": 1, + "output_name": "output" + }, + "wait_for": { + "id": 3, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "input" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "organism_id" + }, + { + "description": "runtime parameter for tool Chado load Blast results", + "name": "wait_for" + } + ], + "label": "load blast results org2", + "name": "Chado load Blast results", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 439.8000183105469, + "height": 163.60000610351562, + "left": 520.5, + "right": 720.5, + "top": 276.20001220703125, + "width": 200, + "x": 520.5, + "y": 276.20001220703125 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/chado_load_blast/load_blast/2.3.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "08ae8b27b193", + "name": "chado_load_blast", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"blastdb_id\": \"21\", \"input\": {\"__class__\": \"RuntimeValue\"}, \"match_on_name\": \"false\", \"organism_id\": {\"__class__\": \"RuntimeValue\"}, \"psql_target\": {\"method\": \"remote\", \"__current_case__\": 0}, \"query_type\": \"polypeptide\", \"re_name\": \"\", \"skip_missing\": \"false\", \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.3.4+galaxy0", + "type": "tool", + "uuid": "a7ec5c91-7cef-4b9f-95a0-ed5542b8e142", + "workflow_outputs": [ + { + "label": null, + "output_name": "results", + "uuid": "119f219e-3d80-4b42-bb38-d07d4583048c" + } + ] + }, + "5": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "errors": null, + "id": 5, + "input_connections": { + "wait_for": { + "id": 4, + "output_name": "results" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Synchronize an analysis", + "name": "analysis_id" + }, + { + "description": "runtime parameter for tool Synchronize an analysis", + "name": "wait_for" + } + ], + "label": "sync blast analysis org2", + "name": "Synchronize an analysis", + "outputs": [ + { + "name": "results", + "type": "json" + } + ], + "position": { + "bottom": 440.8000183105469, + "height": 153.60000610351562, + "left": 828.5, + "right": 1028.5, + "top": 287.20001220703125, + "width": 200, + "x": 828.5, + "y": 287.20001220703125 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_analysis_sync/analysis_sync/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "f487ff676088", + "name": "tripal_analysis_sync", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"analysis_id\": {\"__class__\": \"RuntimeValue\"}, \"wait_for\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "2fff7637-7904-46ff-87e1-ce2721727e75", + "workflow_outputs": [ + { + "label": "Synchronize Analysis into Tripal", + "output_name": "results", + "uuid": "924991f3-6dd4-4752-9ce2-3832d72dff57" + } + ] + }, + "6": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "errors": null, + "id": 6, + "input_connections": { + "wait_for": { + "id": 5, + "output_name": "results" + } + }, + "inputs": [], + "label": "populate mat views", + "name": "Populate materialized views", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 368.8000030517578, + "height": 153.60000610351562, + "left": 1103.5, + "right": 1303.5, + "top": 215.1999969482422, + "width": 200, + "x": 1103.5, + "y": 215.1999969482422 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_populate_mviews/db_populate_mviews/3.2.1.0", + "tool_shed_repository": { + "changeset_revision": "3c08f32a3dc1", + "name": "tripal_db_populate_mviews", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"mview\": \"\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.0", + "type": "tool", + "uuid": "5c947dd5-89df-4146-9ab8-9e1d6de42360", + "workflow_outputs": [ + { + "label": "Populate Tripal materialized view(s)", + "output_name": "results", + "uuid": "dc519305-8c27-4c53-9150-7dd37b5090cd" + } + ] + }, + "7": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "errors": null, + "id": 7, + "input_connections": { + "wait_for": { + "id": 6, + "output_name": "results" + } + }, + "inputs": [], + "label": "index tripal data", + "name": "Index Tripal data", + "outputs": [ + { + "name": "results", + "type": "txt" + } + ], + "position": { + "bottom": 349, + "height": 112.80000305175781, + "left": 1373.5, + "right": 1573.5, + "top": 236.1999969482422, + "width": 200, + "x": 1373.5, + "y": 236.1999969482422 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/gga/tripal_db_index/db_index/3.2.1.1", + "tool_shed_repository": { + "changeset_revision": "d55a39f12dda", + "name": "tripal_db_index", + "owner": "gga", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"expose\": {\"do_expose\": \"no\", \"__current_case__\": 0}, \"queues\": \"10\", \"table\": {\"mode\": \"website\", \"__current_case__\": 0}, \"tokenizer\": \"standard\", \"wait_for\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "3.2.1.1", + "type": "tool", + "uuid": "5ecc30b0-07ab-4f9b-81f0-31310358c221", + "workflow_outputs": [ + { + "label": "Index Tripal data", + "output_name": "results", + "uuid": "e2911922-2412-4618-97fe-bcc783bb0865" + } + ] + } + }, + "tags": [], + "uuid": "ffae97b5-698a-41a5-8561-470300594544", + "version": 6 +} \ No newline at end of file diff --git a/workflows/Interproscan.ga b/workflows_phaeoexplorer/Interproscan.ga similarity index 100% rename from workflows/Interproscan.ga rename to workflows_phaeoexplorer/Interproscan.ga diff --git a/workflows/Jbrowse.ga b/workflows_phaeoexplorer/Jbrowse.ga similarity index 100% rename from workflows/Jbrowse.ga rename to workflows_phaeoexplorer/Jbrowse.ga