diff --git a/README.md b/README.md index 1c2aa4ab8bac43dceeca593f9b91dafe2a82f664..57480745cc71a5ee869542f6ee7eb0ed42fc1b23 100755 --- a/README.md +++ b/README.md @@ -5,18 +5,15 @@ Automated integration of new organisms into GGA environments as a form of a dock ## Description: Automatically generate functional GGA environments from a descriptive input yaml file. See example datasets (example.yml) for an example of what information can be described -and the correct formatting of these input files. +and the correct formatting of these input files -"gga_load_data" in its current version is divided in 4 parts: +The "gga_load_data" tool is divided in 4 separate scripts: -- gga_init: Create directory tree and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks -- gga_get_data: Copy datasets for the input organisms into the organisms directory tree +- gga_init: Create directory tree for organisms and deploy stacks for the input organisms as well as Traefik and optionally Authelia stacks +- gga_get_data: Create "src_data" directory tree for organisms and copy datasets for the input organisms into the organisms directory tree - gga_load_data: Load the datasets of the input organisms into a library in their galaxy instance - run_workflow_phaeoexplorer: Remotely run a custom workflow in galaxy, proposed as an "example script" to take inspiration from as workflow parameters are specific to Phaeoexplorer data -## Metadata files (WIP): -A metadata file will be generated to summarize what actions have previously been taken inside a stack. - ## Directory tree: For every input organism, a dedicated directory is created. The script will create this directory and all subdirectories required. @@ -46,7 +43,7 @@ Directory tree structure: | | |---/genome | | | |---/genus1_species1_strain_sex | | | |---/vX.X -| | | |---/genus_species_vX.X.fasta +| | | |---/vX.X.fasta | | | | | |---/annotation | | | |---/genus1_species1_strain_sex @@ -75,101 +72,45 @@ Directory tree structure: ``` -## Steps: -For each input organism, the tool is used as a set of separate steps/scripts. - -**Part 1)** - -1) Create the directory tree structure (if it already exists, only create the required subdirectories) -2) Create the docker-compose file for the organism and deploy the stack of services - -**Warning: the Galaxy service takes up to 2 hours to be set up (for backup purposes). During these 2 hours it can't be interacted with, wait at least 2 hours before calling the other scripts** - -**Part 2)** - -3) Find source data files as specified in the input file and copy these source files to the organism correct src_data subfolders - -**Part 3)** - -4) Create galaxy library and load datasets in this library. Set up the galaxy instance and history(-ies) - -**Part 4)** (Script only available to Phaeoexplorer members) - -5) Modify headers in the transcripts and protein fasta files -6) Transfer manual annotation descriptions and hectar descriptions to the organism GFF file - -**Part 5)** - -7) Configure and invoke a workflow for the organism - - ## Usage: The scripts all take one mandatory input file that describes your species and their associated data -(see yml_example_input.yml in the "examples" folder of the repository) +(see example.yml in the "examples" folder of the repository) You must also fill in a "config" file containing sensible variables (galaxy and tripal passwords, etc..) that the script will read to create the different services and to access the galaxy container. By default, the config file -inside the repository root will be used if none is precised in the command line +inside the repository root will be used if none is precised in the command line. An example of this config file is available +in the "examples" folder of the repository. -**Warning: the config file is not required as an option for the "gga_get_data" script** +**Warning: the config file is not required as an option for the "gga_init" and "gga_get_data" scripts** - Deploy stacks part: ```$ python3 /path/to/repo/gga_init.py your_input_file.yml -c/--config your_config_file [-v/--verbose] [OPTIONS]``` - OPTIONS - --main-directory $PATH (Path where to create/update stacks; default=current directory) - --traefik (If specified, will try to start or overwrite traefik+authelia stack; default=False) - --http (Use a http traefik+authelia configuration; default=False) - --https (Use a https traefik+authelia configuration, might require a certificate for the hostname; default=True) + --main-directory $PATH (Path where to create/update stacks; default=current directory) + --force-traefik (If specified, will overwrite traefik and authelia files; default=False) - Copy source data file: ```$ python3 /path/to/repo/gga_get_data.py your_input_file.yml [-v/--verbose] [OPTIONS]``` - OPTIONS - --main-directory $PATH (Path where to access stacks; default=current directory) + --main-directory $PATH (Path where to access stacks; default=current directory) - Load data in galaxy library and prepare galaxy instance: ```$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file [-v/--verbose]``` - OPTIONS - --main-directory $PATH (Path where to access stacks; default=current directory) + --main-directory $PATH (Path where to access stacks; default=current directory) - Run a workflow in galaxy: ```$ python3 /path/to/repo/gga_load_data.py your_input_file.yml -c/--config your_config_file --workflow /path/to/workflow.ga [-v/--verbose] [OPTIONS]``` - --workflow $WORKFLOW - Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository - OPTIONS - --main-directory $PATH (Path where to access stacks; default=current directory) - --setup (Set up the organism instance: create organism and analyses, get their IDs, etc.. This option is MANDATORY when first running the script in a new galaxy instance or else the script will not be able to set runtime parameters for the workflows) + --workflow $WORKFLOW (Path to the workflow to run in galaxy. A couple of preset workflows are available in the "workflows" folder of the repository) + --main-directory $PATH (Path where to access stacks; default=current directory) **Warning: the "input file" and "config file" have to be the same for all scripts!** ## Current limitations -When deploying the stack of services, the galaxy service takes a long time to be ready (around 2 hours of wait time). This is due to the galaxy container preparing a persistent location for the container data. +When deploying the stack of services, the galaxy service can take a long time to be ready. This is due to the galaxy container preparing a persistent location for the container data. This can be bypassed by setting the variable "persist_galaxy_data" to "True" in the script "config" YAML file The stacks deployment and the data loading into galaxy should hence be run separately and only once the galaxy service is ready To check the status of the galaxy service, you can run ```$ docker service logs -f genus_species_galaxy``` or -```./serexec genus_species_galaxy supervisorctl status``` -to verify directly from the container -*(The "gga_load_data.py" script will check on the galaxy container anyway and will exit while notifying you it is not ready)* +```./serexec genus_species_galaxy supervisorctl status``` to verify directly from the container + +\ +*(The "gga_load_data.py" script will do this automatically anyway and will exit while notifying you it is not ready)* ## Requirements (*temporary*): -Requires Python 3.7+ +Requires Python 3.6 -Packages required: -``` -bioblend==0.14.0 -boto==2.49.0 -certifi==2019.11.28 -cffi==1.14.0 -chardet==3.0.4 -cryptography==2.8 -idna==2.9 -numpy==1.18.1 -pandas==1.0.3 -pycparser==2.20 -pyOpenSSL==19.1.0 -PySocks==1.7.1 -python-dateutil==2.8.1 -pytz==2019.3 -PyYAML==5.3.1 -requests==2.23.0 -requests-toolbelt==0.9.1 -six==1.14.0 -urllib3==1.25.7 -xlrd==1.2.0 -``` \ No newline at end of file +[requirements.txt](./requirements.txt) diff --git a/examples/authelia_config_example.yml b/examples/authelia_config_example.yml new file mode 100644 index 0000000000000000000000000000000000000000..1050c78eb53f58547efdb3b4d5a891c4df708242 --- /dev/null +++ b/examples/authelia_config_example.yml @@ -0,0 +1,355 @@ +############################################################### +# Authelia configuration # +############################################################### + +# The host and port to listen on +host: 0.0.0.0 +port: 9091 +# tls_key: /var/lib/authelia/ssl/key.pem +# tls_cert: /var/lib/authelia/ssl/cert.pem + +# Level of verbosity for logs: info, debug, trace +log_level: info +## File path where the logs will be written. If not set logs are written to stdout. +# log_file_path: /var/log/authelia + +# The secret used to generate JWT tokens when validating user identity by +# email confirmation. +# This secret can also be set using the env variables AUTHELIA_JWT_SECRET +jwt_secret: XXXXXXXXXXXXXXXXX + +# Default redirection URL +# +# If user tries to authenticate without any referer, Authelia +# does not know where to redirect the user to at the end of the +# authentication process. +# This parameter allows you to specify the default redirection +# URL Authelia will use in such a case. +# +# Note: this parameter is optional. If not provided, user won't +# be redirected upon successful authentication. +default_redirection_url: https://localhost/ + +# Google Analytics Tracking ID to track the usage of the portal +# using a Google Analytics dashboard. +# +## google_analytics: UA-00000-01 + +# TOTP Settings +# +# Parameters used for TOTP generation +#totp: + # The issuer name displayed in the Authenticator application of your choice + # See: https://github.com/google/google-authenticator/wiki/Key-Uri-Format for more info on issuer names + #issuer: authelia.com + # The period in seconds a one-time password is current for. Changing this will require all users to register + # their TOTP applications again. + # Warning: before changing period read the docs link below. + #period: 30 + # The skew controls number of one-time passwords either side of the current one that are valid. + # Warning: before changing skew read the docs link below. + #skew: 1 + # See: https://docs.authelia.com/configuration/one-time-password.html#period-and-skew to read the documentation. + +# Duo Push API +# +# Parameters used to contact the Duo API. Those are generated when you protect an application +# of type "Partner Auth API" in the management panel. +#duo_api: + #hostname: api-123456789.example.com + #integration_key: ABCDEF + # This secret can also be set using the env variables AUTHELIA_DUO_API_SECRET_KEY + #secret_key: 1234567890abcdefghifjkl + +# The authentication backend to use for verifying user passwords +# and retrieve information such as email address and groups +# users belong to. +# +# There are two supported backends: 'ldap' and 'file'. +authentication_backend: + # Disable both the HTML element and the API for reset password functionality + disable_reset_password: true + + # LDAP backend configuration. + # + # This backend allows Authelia to be scaled to more + # than one instance and therefore is recommended for + # production. +# ldap: + # The url to the ldap server. Scheme can be ldap:// or ldaps:// +# url: ldap://ldap-host-name + # Skip verifying the server certificate (to allow self-signed certificate). +# skip_verify: false + + # The base dn for every entries +# base_dn: dc=genouest,dc=org + + # The attribute holding the username of the user. This attribute is used to populate + # the username in the session information. It was introduced due to #561 to handle case + # insensitive search queries. + # For you information, Microsoft Active Directory usually uses 'sAMAccountName' and OpenLDAP + # usually uses 'uid' + username_attribute: uid + + # An additional dn to define the scope to all users +# additional_users_dn: ou=users + + # The users filter used in search queries to find the user profile based on input filled in login form. + # Various placeholders are available to represent the user input and back reference other options of the configuration: + # - {input} is a placeholder replaced by what the user inputs in the login form. + # - {username_attribute} is a placeholder replaced by what is configured in `username_attribute`. + # - {mail_attribute} is a placeholder replaced by what is configured in `mail_attribute`. + # - DON'T USE - {0} is an alias for {input} supported for backward compatibility but it will be deprecated in later versions, so please don't use it. + # + # Recommended settings are as follows: + # - Microsoft Active Directory: (&({username_attribute}={input})(objectCategory=person)(objectClass=user)) + # - OpenLDAP: (&({username_attribute}={input})(objectClass=person))' or '(&({username_attribute}={input})(objectClass=inetOrgPerson)) + # + # To allow sign in both with username and email, one can use a filter like + # (&(|({username_attribute}={input})({mail_attribute}={input}))(objectClass=person)) + users_filter: (&({username_attribute}={input})(objectClass=Person)(isActive=TRUE)) + + # An additional dn to define the scope of groups +# additional_groups_dn: ou=groups + + # The groups filter used in search queries to find the groups of the user. + # - {input} is a placeholder replaced by what the user inputs in the login form. + # - {username} is a placeholder replace by the username stored in LDAP (based on `username_attribute`). + # - {dn} is a matcher replaced by the user distinguished name, aka, user DN. + # - {username_attribute} is a placeholder replaced by what is configured in `username_attribute`. + # - {mail_attribute} is a placeholder replaced by what is configured in `mail_attribute`. + # - DON'T USE - {0} is an alias for {input} supported for backward compatibility but it will be deprecated in later versions, so please don't use it. + # - DON'T USE - {1} is an alias for {username} supported for backward compatibility but it will be deprecated in later version, so please don't use it. +# groups_filter: (&(member={dn})(objectclass=bipaaGroup)) + + # The attribute holding the name of the group +# group_name_attribute: cn + + # The attribute holding the mail address of the user +# mail_attribute: mail + + # The username and password of the admin user. +# user: cn=admin,dc=genouest,dc=org + # This secret can also be set using the env variables AUTHELIA_AUTHENTICATION_BACKEND_LDAP_PASSWORD +# password: XXXXXXXXXXXXXX + + # File backend configuration. + # + # With this backend, the users database is stored in a file + # which is updated when users reset their passwords. + # Therefore, this backend is meant to be used in a dev environment + # and not in production since it prevents Authelia to be scaled to + # more than one instance. The options under password_options have sane + # defaults, and as it has security implications it is highly recommended + # you leave the default values. Before considering changing these settings + # please read the docs page below: + # https://docs.authelia.com/configuration/authentication/file.html#password-hash-algorithm-tuning + # + ## file: + ## path: ./users_database.yml + file: + path: /etc/authelia/users.yml + password_options: + algorithm: argon2id + iterations: 1 + key_length: 32 + salt_length: 16 + memory: 1024 + parallelism: 8 + + +# Access Control +# +# Access control is a list of rules defining the authorizations applied for one +# resource to users or group of users. +# +# If 'access_control' is not defined, ACL rules are disabled and the 'bypass' +# rule is applied, i.e., access is allowed to anyone. Otherwise restrictions follow +# the rules defined. +# +# Note: One can use the wildcard * to match any subdomain. +# It must stand at the beginning of the pattern. (example: *.mydomain.com) +# +# Note: You must put patterns containing wildcards between simple quotes for the YAML +# to be syntactically correct. +# +# Definition: A 'rule' is an object with the following keys: 'domain', 'subject', +# 'policy' and 'resources'. +# +# - 'domain' defines which domain or set of domains the rule applies to. +# +# - 'subject' defines the subject to apply authorizations to. This parameter is +# optional and matching any user if not provided. If provided, the parameter +# represents either a user or a group. It should be of the form 'user:<username>' +# or 'group:<groupname>'. +# +# - 'policy' is the policy to apply to resources. It must be either 'bypass', +# 'one_factor', 'two_factor' or 'deny'. +# +# - 'resources' is a list of regular expressions that matches a set of resources to +# apply the policy to. This parameter is optional and matches any resource if not +# provided. +# +# Note: the order of the rules is important. The first policy matching +# (domain, resource, subject) applies. +access_control: + # Default policy can either be 'bypass', 'one_factor', 'two_factor' or 'deny'. + # It is the policy applied to any resource if there is no policy to be applied + # to the user. + default_policy: bypass + + rules: + # The login portal is freely accessible (redirectino loop otherwise) + - domain: auth.example.org + policy: bypass + + # Apollo needs to be authenticated + - domain: localhost + resources: + - "^/apollo/.*$" + policy: one_factor + + # traefik dashboard is restricted to a group from ldap + - domain: localhost + resources: + - "^/traefik/.*$" + policy: one_factor + subject: "group:ldap_admin" + - domain: localhost + resources: + - "^/traefik/.*$" + policy: deny + + # All galaxies are restricted to a group from ldap + - domain: localhost + resources: + - "^/sp/.+/galaxy/.*$" + policy: one_factor + subject: "group:ldap_admin" + - domain: localhost + resources: + - "^/sp/.+/galaxy$" + policy: deny + + # A genome restricted to an ldap group + - domain: localhost + resources: + - "^/sp/genus_species/.*$" + policy: one_factor + subject: "group:gspecies" + - domain: localhost + resources: + - "^/sp/genus_species/.*$" + policy: deny + + +# Configuration of session cookies +# +# The session cookies identify the user once logged in. +session: + # The name of the session cookie. (default: authelia_session). + name: authelia_replaceme_session + + # The secret to encrypt the session data. This is only used with Redis. + # This secret can also be set using the env variables AUTHELIA_SESSION_SECRET + secret: WXXXXXXXXXXXXXXXXXXXcXXXXXXXXXXXXXX + + # The time in seconds before the cookie expires and session is reset. + expiration: 3600000 # 1000 hour + + # The inactivity time in seconds before the session is reset. + # abretaud: We get an Unauthorized message when reaching this threshold => disabling by making > cookie lifetime + inactivity: 3700000 # 5 minutes + + # The remember me duration. + # Value of 0 disables remember me. + # Value is in seconds, or duration notation. See: https://docs.authelia.com/configuration/index.html#duration-notation-format + # Longer periods are considered less secure because a stolen cookie will last longer giving attackers more time to spy + # or attack. Currently the default is 1M or 1 month. + remember_me_duration: 1M + + # The domain to protect. + # Note: the authenticator must also be in that domain. If empty, the cookie + # is restricted to the subdomain of the issuer. + domain: replaceme.org + + # The redis connection details + redis: + host: authelia-redis + port: 6379 + # This secret can also be set using the env variables AUTHELIA_SESSION_REDIS_PASSWORD + #password: authelia + # This is the Redis DB Index https://redis.io/commands/select (sometimes referred to as database number, DB, etc). + #database_index: 0 + +# Configuration of the authentication regulation mechanism. +# +# This mechanism prevents attackers from brute forcing the first factor. +# It bans the user if too many attempts are done in a short period of +# time. +regulation: + # The number of failed login attempts before user is banned. + # Set it to 0 to disable regulation. + max_retries: 3 + + # The time range during which the user can attempt login before being banned. + # The user is banned if the authentication failed 'max_retries' times in a 'find_time' seconds window. + # Find Time accepts duration notation. See: https://docs.authelia.com/configuration/index.html#duration-notation-format + find_time: 2m + + # The length of time before a banned user can login again. + # Ban Time accepts duration notation. See: https://docs.authelia.com/configuration/index.html#duration-notation-format + ban_time: 5m + +# Configuration of the storage backend used to store data and secrets. +# +# You must use only an available configuration: local, mysql, postgres +storage: + postgres: + host: authelia-db + port: 5432 + database: postgres + username: postgres + # # This secret can also be set using the env variables AUTHELIA_STORAGE_POSTGRES_PASSWORD + password: XXXXXXXX + +# Configuration of the notification system. +# +# Notifications are sent to users when they require a password reset, a u2f +# registration or a TOTP registration. +# Use only an available configuration: filesystem, gmail +notifier: + # For testing purpose, notifications can be sent in a file + ## filesystem: + ## filename: /tmp/authelia/notification.txt + + # Use a SMTP server for sending notifications. Authelia uses PLAIN or LOGIN method to authenticate. + # [Security] By default Authelia will: + # - force all SMTP connections over TLS including unauthenticated connections + # - use the disable_require_tls boolean value to disable this requirement (only works for unauthenticated connections) + # - validate the SMTP server x509 certificate during the TLS handshake against the hosts trusted certificates + # - trusted_cert option: + # - this is a string value, that may specify the path of a PEM format cert, it is completely optional + # - if it is not set, a blank string, or an invalid path; will still trust the host machine/containers cert store + # - defaults to the host machine (or docker container's) trusted certificate chain for validation + # - use the trusted_cert string value to specify the path of a PEM format public cert to trust in addition to the hosts trusted certificates + # - use the disable_verify_cert boolean value to disable the validation (prefer the trusted_cert option as it's more secure) + smtp: + #username: test + # This secret can also be set using the env variables AUTHELIA_NOTIFIER_SMTP_PASSWORD + #password: password + #secure: false + host: smtp-server-hostname + port: 25 + disable_require_tls: true + sender: replace@me.fr + + # Sending an email using a Gmail account is as simple as the next section. + # You need to create an app password by following: https://support.google.com/accounts/answer/185833?hl=en + ## smtp: + ## username: myaccount@gmail.com + ## # This secret can also be set using the env variables AUTHELIA_NOTIFIER_SMTP_PASSWORD + ## password: yourapppassword + ## sender: admin@example.com + ## host: smtp.gmail.com + ## port: 587 diff --git a/examples/example.yml b/examples/example.yml new file mode 100644 index 0000000000000000000000000000000000000000..b5f8589e73aef31899b7c8444160a4abc24b7ab6 --- /dev/null +++ b/examples/example.yml @@ -0,0 +1,41 @@ +# Input file for the automated creation GGA docker stacks +# The file consists in a "list" of species for which the script will have to create these stacks/load data into galaxy/run workflows +# This file is internally turned into a list of dictionaries by the scripts + +citrus_sinensis: # Dummy value to designate the species (isn't used by the script) + description: + # Species description, leave blank if unknown or you don't want it to be used + # These parameters are used to set up the various urls and adresses in different containers + # The script requires at least the genus to be specified + genus: "Citrus" # Mandatory! + species: "sinensis" # Mandatory! + sex: "male" + strain: "" + common_name: "" + origin: "" + # the sex and strain, the script will look for files containing the genus, species, sex and strain of the species) + # If no file corresponding to the description is found, this path will be considered empty and the script will + # proceed to the next step (create the directory tree for the GGA docker stack) + data: + # Sequence of paths to the different datasets to copy and import into the galaxy container (as a shared library) + genome_path: "./examples/src_data/genome/v1.0/Citrus_sinensis-scaffold00001.fasta" # Mandatory! + transcripts_path: "./examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.fasta" # Mandatory! + proteins_path: "" # Mandatory! + gff_path: "./examples/src_data/annotation/v1.0/Citrus_sinensis-orange1.1g015632m.g.gff3" # Mandatory! + interpro_path: "./examples/src_data/annotation/v1.0/functional_annotation/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml" + orthofinder_path: "/path/to/orthofinder" + blastp_path: "./examples/src_data/annotation/v1.0/functional_annotation/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out" + blastx_path: "Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out" + # If the user has several datasets of the same 'nature' (gff, genomes, ...) to upload to galaxy, the next scalar is used by the script to differentiate + # between these different versions and name directories according to it and not overwrite the existing data + # If left empty, the genome will be considered version "1.0" + genome_version: "1.0" + # Same as genome version, but for the OGS analysis + ogs_version: "1.0" + performed_by: "" + services: + # Describe what optional services to deploy for the stack + # By default, only tripal, tripaldb and galaxy services will be deployed + blast: "False" + wiki: "False" + apollo: "False" \ No newline at end of file diff --git a/gga_init.py b/gga_init.py index bc0a19daa052e64e61e3f8636c134dd226d19a8f..3093faccd25b158cb3a117c097c0824a669c9571 100755 --- a/gga_init.py +++ b/gga_init.py @@ -205,12 +205,17 @@ def make_traefik_compose_files(config, main_dir): if config["authelia_config_path"]: if not config["authelia_config_path"] == "" or not config["authelia_config_path"] == "/path/to/authelia/config": if os.path.isfile(os.path.abspath(config["authelia_config_path"])): - authelia_config_template = env.get_template(os.path.basename(config["authelia_config_path"])) - authelia_config_output = authelia_config_template.render(render_vars) - with open(os.path.join(main_dir, "traefik/authelia/configuration.yml"), 'w') as authelia_config_file: - logging.info("Writing authelia configuration.yml") - authelia_config_file.truncate(0) - authelia_config_file.write(authelia_config_output) + try: + shutil.copy(os.path.abspath(config["banner_path"]), "./traefik/authelia/configuration.yml") + except Exception as exc: + logging.critical("Could not copy authlia configuration file") + sys.exit(exc) + # authelia_config_template = env.get_template(os.path.basename(config["authelia_config_path"])) + # authelia_config_output = authelia_config_template.render(render_vars) + # with open(os.path.join(main_dir, "traefik/authelia/configuration.yml"), 'w') as authelia_config_file: + # logging.info("Writing authelia configuration.yml") + # authelia_config_file.truncate(0) + # authelia_config_file.write(authelia_config_output) else: logging.critical("Cannot find authelia configuration template path (%s)" % config["authelia_config_path"]) sys.exit() diff --git a/gga_load_data.py b/gga_load_data.py index c6f50d212c9f6575a9d11e5621a8af7a88d13d49..1b29b1ecb06e1bac066d9e223fd91e706cfffd1f 100755 --- a/gga_load_data.py +++ b/gga_load_data.py @@ -84,7 +84,7 @@ class LoadData(speciesData.SpeciesData): """ logging.debug("Getting 'Homo sapiens' ID in instance's chado database") get_sapiens_id_job = self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.2", + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_get_organisms/organism_get_organisms/2.3.4+galaxy0", history_id=self.history_id, tool_inputs={"genus": "Homo", "species": "sapiens"}) get_sapiens_id_job_output = get_sapiens_id_job["outputs"][0]["id"] @@ -95,13 +95,13 @@ class LoadData(speciesData.SpeciesData): sapiens_id = str( get_sapiens_id_final_output["organism_id"]) # needs to be str to be recognized by the chado tool self.instance.tools.run_tool( - tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_delete_organisms/organism_delete_organisms/2.3.2", + tool_id="toolshed.g2.bx.psu.edu/repos/gga/chado_organism_delete_organisms/organism_delete_organisms/2.3.3", history_id=self.history_id, tool_inputs={"organism": str(sapiens_id)}) except bioblend.ConnectionError: - logging.debug("Homo sapiens isn't in the instance's chado database") + logging.debug("Homo sapiens isn't in the instance's chado database (bioblend.ConnectionError)") except IndexError: - logging.debug("Homo sapiens isn't in the instance's chado database") + logging.debug("Homo sapiens isn't in the instance's chado database (IndexError)") pass def purge_histories(self): diff --git a/speciesData.py b/speciesData.py index 81900eb078c55ca77b528e8d0c83f30797ee8e38..cf9da39cc8c1473f9d29c77b7f253cfb39c9be67 100755 --- a/speciesData.py +++ b/speciesData.py @@ -18,11 +18,11 @@ class SpeciesData: def __init__(self, parameters_dictionary): # self.config_dictionary = None self.parameters_dictionary = parameters_dictionary - self.species = parameters_dictionary["description"]["species"].replace("(", "_").replace(")", "_") - self.genus = parameters_dictionary["description"]["genus"].replace("(", "_").replace(")", "_") - self.strain = parameters_dictionary["description"]["strain"].replace("(", "_").replace(")", "_") - self.sex = parameters_dictionary["description"]["sex"].replace("(", "_").replace(")", "_") - self.common = parameters_dictionary["description"]["common_name"].replace("(", "_").replace(")", "_") + self.species = parameters_dictionary["description"]["species"].replace("(", "_").replace(")", "_").replace("-", "_") + self.genus = parameters_dictionary["description"]["genus"].replace("(", "_").replace(")", "_").replace("-", "_") + self.strain = parameters_dictionary["description"]["strain"].replace("(", "_").replace(")", "_").replace("-", "_") + self.sex = parameters_dictionary["description"]["sex"].replace("(", "_").replace(")", "_").replace("-", "_") + self.common = parameters_dictionary["description"]["common_name"].replace("(", "_").replace(")", "_").replace("-", "_") self.date = datetime.today().strftime("%Y-%m-%d") self.origin = parameters_dictionary["description"]["origin"] @@ -76,11 +76,9 @@ class SpeciesData: self.source_files = dict() self.workflow_name = None self.metadata = dict() - self.api_key = None - # API key used to communicate with the galaxy instance. Cannot be used to do user-tied actions + self.api_key = None # API key used to communicate with the galaxy instance. Cannot be used to do user-tied actions self.datasets = dict() - self.config = None - # Custom config used to set environment variables inside containers, defaults to the one in the repo + self.config = None # Custom config used to set environment variables inside containers self.species_folder_name = "_".join(utilities.filter_empty_not_empty_items([self.genus_lowercase.lower(), self.species.lower(), self.strain.lower(), self.sex.lower()])["not_empty"]) self.existing_folders_cache = {} self.bam_metadata_cache = {} diff --git a/templates/gspecies_compose_template.yml.j2 b/templates/gspecies_compose_template.yml.j2 index 864fe484f471422cf35e8bdf891ce452d87e09a3..72e9595260f00de7e57b453475389fe92f56153c 100644 --- a/templates/gspecies_compose_template.yml.j2 +++ b/templates/gspecies_compose_template.yml.j2 @@ -289,7 +289,6 @@ services: window: 120s blast-db: -# image: postgres:9.6-alpine image: postgres:9.5 environment: - POSTGRES_PASSWORD=postgres @@ -300,44 +299,46 @@ services: - {{ genus_species }} {% endif %} -# wiki: -# image: quay.io/abretaud/mediawiki -# environment: -# MEDIAWIKI_SERVER: http://localhost -# MEDIAWIKI_PROXY_PREFIX: /sp/{{ genus_species }}/wiki -# MEDIAWIKI_SITENAME: {{ Genus }} {{ species }} -# MEDIAWIKI_SECRET_KEY: XXXXXXXXXX -# MEDIAWIKI_DB_HOST: wiki-db.{{genus_species }} -# MEDIAWIKI_DB_PASSWORD: password -# MEDIAWIKI_ADMIN_USER: abretaud # ldap user -# depends_on: -# - wiki-db -# volumes: -# - ./docker_data/wiki_uploads:/images -# #- ../bipaa_wiki.png:/var/www/mediawiki/resources/assets/wiki.png:ro # To change the logo at the top left -# networks: -# - traefikbig -# - {{ genus_species }} -# deploy: -# labels: -# - "traefik.http.routers.{{ genus_species }}-wiki.rule=(Host(`{{ hostname }}`) && PathPrefix(`/sp/{{ genus_species }}/wiki`))" -# - "traefik.http.routers.{{ genus_species }}-wiki.tls=true" -# - "traefik.http.routers.{{ genus_species }}-wiki.entryPoints={{ entrypoint }}" -# - "traefik.http.routers.{{ genus_species }}-wiki.middlewares=sp-big-req,sp-auth,sp-app-trailslash,sp-app-prefix" -# - "traefik.http.services.{{ genus_species }}-wiki.loadbalancer.server.port=80" -# restart_policy: -# condition: on-failure -# delay: 5s -# max_attempts: 3 -# window: 120s - -# wiki-db: -# image: postgres:9.6-alpine -# volumes: -# - ./docker_data/wiki_db/:/var/lib/postgresql/data/ -# networks: -# - {{ genus_species }} + {% if wiki == True %} + wiki: + image: quay.io/abretaud/mediawiki + environment: + MEDIAWIKI_SERVER: http://localhost + MEDIAWIKI_PROXY_PREFIX: /sp/{{ genus_species }}/wiki + MEDIAWIKI_SITENAME: {{ Genus }} {{ species }} + MEDIAWIKI_SECRET_KEY: XXXXXXXXXX + MEDIAWIKI_DB_HOST: wiki-db.{{genus_species }} + MEDIAWIKI_DB_PASSWORD: password + MEDIAWIKI_ADMIN_USER: abretaud # ldap user + depends_on: + - wiki-db + volumes: + - ./docker_data/wiki_uploads:/images + #- ../bipaa_wiki.png:/var/www/mediawiki/resources/assets/wiki.png:ro # To change the logo at the top left + networks: + - traefikbig + - {{ genus_species }} + deploy: + labels: + - "traefik.http.routers.{{ genus_species }}-wiki.rule=(Host(`{{ hostname }}`) && PathPrefix(`/sp/{{ genus_species }}/wiki`))" + - "traefik.http.routers.{{ genus_species }}-wiki.tls=true" + - "traefik.http.routers.{{ genus_species }}-wiki.entryPoints={{ entrypoint }}" + - "traefik.http.routers.{{ genus_species }}-wiki.middlewares=sp-big-req,sp-auth,sp-app-trailslash,sp-app-prefix" + - "traefik.http.services.{{ genus_species }}-wiki.loadbalancer.server.port=80" + restart_policy: + condition: on-failure + delay: 5s + max_attempts: 3 + window: 120s + wiki-db: + image: postgres:9.6-alpine + volumes: + - ./docker_data/wiki_db/:/var/lib/postgresql/data/ + networks: + - {{ genus_species }} + {% endif %} + networks: traefikbig: external: true diff --git a/templates/traefik.yml b/templates/traefik.yml new file mode 100644 index 0000000000000000000000000000000000000000..f47766c44505262e4709f5d6ca91e1a5fb4ecccd --- /dev/null +++ b/templates/traefik.yml @@ -0,0 +1,120 @@ +version: '3.7' +services: + traefik: + image: traefik:2.1.6 + command: + - "--api" + - "--api.dashboard" +# - "--api.insecure=true" # added by lg to debug, for dashboard + - "--log.level=DEBUG" + - "--providers.docker" + - "--providers.docker.swarmMode=true" + - "--providers.docker.network=traefikbig" # changed by lg from traefik to traefikbig + - "--entryPoints.web.address=:80" + - "--entryPoints.web.forwardedHeaders.trustedIPs=192.168.1.133" # The ips of our upstream proxies: eci + - "--entryPoints.webs.address=:443" + - "--entryPoints.webs.forwardedHeaders.trustedIPs=192.168.1.133" # The ips of our upstream proxies: eci + ports: + - 8001:8080 # added by lg to debug, for dashboard + - 8888:80 + - 8889:443 + networks: + - traefikbig + volumes: + - /var/run/docker.sock:/var/run/docker.sock + deploy: + placement: + constraints: + - node.role == manager + labels: +# - "traefik.http.routers.traefik-api.rule=PathPrefix(`/traefik`)" + - "traefik.http.routers.traefik-api.rule=PathPrefix(`/api`) || PathPrefix(`/dashboard`) || PathPrefix(`/traefik`)" # lg +# - "traefik.http.routers.traefik-api.tls=true" + - "traefik.http.routers.traefik-api.entryPoints=web" # lg +# - "traefik.http.routers.traefik-api.entryPoints=webs" + - "traefik.http.routers.traefik-api.service=api@internal" + - "traefik.http.middlewares.traefik-strip.stripprefix.prefixes=/traefik" + - "traefik.http.middlewares.traefik-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://auth.abims-gga.sb-roscoff.fr/" + - "traefik.http.middlewares.traefik-auth.forwardauth.trustForwardHeader=true" +# - "traefik.http.routers.traefik-api.middlewares=traefik-auth,traefik-strip" + - "traefik.http.routers.traefik-api.middlewares=traefik-strip" # lg + # Dummy service for Swarm port detection. The port can be any valid integer value. + - "traefik.http.services.traefik-svc.loadbalancer.server.port=9999" + # Some generally useful middlewares for organisms hosting + - "traefik.http.middlewares.sp-auth.forwardauth.address=http://authelia:9091/api/verify?rd=https://auth.abims-gga.sb-roscoff.fr/" + - "traefik.http.middlewares.sp-auth.forwardauth.trustForwardHeader=true" + - "traefik.http.middlewares.sp-auth.forwardauth.authResponseHeaders=Remote-User,Remote-Groups" +# - "traefik.http.middlewares.sp-trailslash.redirectregex.regex=^(https?://[^/]+/sp/[^/]+)$$" + - "traefik.http.middlewares.sp-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+)$$" # lg + - "traefik.http.middlewares.sp-trailslash.redirectregex.replacement=$${1}/" + - "traefik.http.middlewares.sp-trailslash.redirectregex.permanent=true" +# - "traefik.http.middlewares.sp-app-trailslash.redirectregex.regex=^(https?://[^/]+/sp/[^/]+/[^/]+)$$" + - "traefik.http.middlewares.sp-app-trailslash.redirectregex.regex=^(http?://[^/]+/sp/[^/]+/[^/]+)$$" # lg + - "traefik.http.middlewares.sp-app-trailslash.redirectregex.replacement=$${1}/" + - "traefik.http.middlewares.sp-app-trailslash.redirectregex.permanent=true" + - "traefik.http.middlewares.sp-prefix.stripprefixregex.regex=/sp/[^/]+" + - "traefik.http.middlewares.sp-app-prefix.stripprefixregex.regex=/sp/[^/]+/[^/]+" + - "traefik.http.middlewares.tripal-addprefix.addprefix.prefix=/tripal" + - "traefik.http.middlewares.sp-big-req.buffering.maxRequestBodyBytes=50000000" + - "traefik.http.middlewares.sp-huge-req.buffering.maxRequestBodyBytes=2000000000" + restart_policy: + condition: on-failure + delay: 5s + max_attempts: 3 + window: 120s + + authelia: + image: authelia/authelia:4.12.0 + networks: + - traefikbig + depends_on: + - authelia-redis + - authelia-db + volumes: + - ./authelia/:/etc/authelia/:ro + deploy: + labels: + - "traefik.http.routers.authelia.rule=Host(`auth.example.org`)" + - "traefik.http.services.authelia.loadbalancer.server.port=9091" + restart_policy: + condition: on-failure + delay: 5s + max_attempts: 3 + window: 120s + + authelia-redis: + image: redis:5.0.7-alpine + command: ["redis-server", "--appendonly", "yes"] + volumes: + - ./authelia-redis/:/data/ + networks: + - traefikbig + deploy: + restart_policy: + condition: on-failure + delay: 5s + max_attempts: 3 + window: 120s + + authelia-db: + image: postgres:12.2-alpine + environment: + POSTGRES_PASSWORD: z3A,hQ-9 + volumes: + - ./docker_data/authelia_db/:/var/lib/postgresql/data/ + networks: + - traefikbig + deploy: + restart_policy: + condition: on-failure + delay: 5s + max_attempts: 3 + window: 120s + +networks: + traefikbig: + driver: overlay + name: traefikbig + ipam: + config: + - subnet: 10.50.0.0/16