Ability for Seqera Platform to apply custom config files to all selected pipelines
complete
C
Charcoal Mandrill
Currently when you configure a single pipeline in Tower / Platform, there is a field where you can supply extra Nextfow config options for the pipeline that would be included in the pipeline's nextflow.config file.
It would be very helpful if we could instead have our own collection of separate config files that could be registered with Tower and easily applied to all desired pipelines, globally.
For example, right now I am updating the config for the nf-prov plugin in our pipelines; I currently have to modify the nextflow.config file in every single pipeline repo in our organization in order to update the settings for all pipelines, and I have to copy/paste the same config into every one of them.
It would be a lot easier if we could have for example a single "plugins" config file, saved on the Tower server, which could be updated, and then have that extra config file used for all pipelines on Tower where we would like to use the updated nf-prov settings.
Similarly this would extend to all other settings saved in the nextflow.config files across all our pipelines; we could reduce code duplication and update all configs at once if we could just save them as single extra config files to be applied by Tower at runtime.
Perhaps such a feature does not need to be only for Tower / Platform either, it might be helpful to be able to do this from the command line as well, the same way that we can use a git repo URL as the Nextflow script argument to
nextflow run
, could probably implement this allowed the nextflow -c
and -C
args to accept a URL to config files hosted on a git repo. This would allow us to maintain a dedicated git repo just for global configs and then easily use them in both Tower and from cli.-----
edit: for clarification, it is possible to use S3 and HTTP file paths with the
includeConfig
nextflow.config directive when running from the command line, however this does not work from Nextflow Tower / Seqera Platform since the Launchpad does not support AWS authentication.C
Charcoal Mandrill
worth mentioning that if you are using an older version of Platform that does not yet have the ability to apply the nextflow.config at the CE then you can still use this in the CE Pre-Run Script
# Decode the base64 encoded configuration
echo $NXF_CONFIG_BASE64 | base64 -d > /tmp/decoded_config
# Append the wave configuration using a heredoc
cat <<EOF >> /tmp/decoded_config
// put your Nextflow configs here
process.container = "ubuntu:latest"
wave {
enabled = true
freeze = true
strategy = 'container'
}
EOF
# Encode the updated configuration back to base64
export NXF_CONFIG_BASE64=$(cat /tmp/decoded_config | base64 | tr -d '\n')
Rob Newman
complete
This was part of the 24.1 cycle 21 Cloud release and will be included in the next Enterprise release (24.2). If there are additional related features not captured, please create a new feature request so we can track. Thanks!
B
Brass Wildcat
Hi Charcoal Mandrill, we have recently released on cloud the possibility to set the "nextflow config" field when creating a compute environment.
The settings in the global next flow config field will be applied to any pipeline launched by using the computing environment.
Would this be of help for your request once it is available for enterprise deployments?
Here is a link to the release notes: https://docs.seqera.io/platform/24.1/cloud/changelog#2420_cycle21-cloud---13-august-2024
C
Charcoal Mandrill
Brass Wildcat yes that sounds great, thanks, looking forward to it for enterprise deployments.
I think ultimately an ideal implementation would allow global config management without being tied to a compute environment, but I think this is a good start
Rob Newman
acknowledged
Y
Yellow sunshine Firefly
This exact issue has led us to run a custom Nextflow job definition in which we run our own entrypoint to put together an infrastructure configuration dynamically. We have many AWS accounts for development, testing, and production scalability. This allows us to not worry about Queue Names, Compute environments, workdirs, etc. When building the AWS Batch environments, we make SSM parameters, which the entrypoint calls, and injects into a config, then is put into the launching command to ensure it is always there. We then tell users to use labels for the attributes of the Queue they are looking for, as well as custom Batch settings. This is very custom and not easily obtained, but it works well for multi-account environments.
Something like
process {
executor = 'awsbatch'
queue = '$DEFAULT_QUEUE'
$CACHE_CONFIG
//Custom Batch Queues
withLabel: Spot {
queue = '$SPOT_QUEUE'
}
withLabel: NvmeSpot {
queue = '$NVME_SPOT_QUEUE'
}
withLabel: StorageOptimizedInstance {
queue = '$STORAGE_OPTIMIZED_SPOT_QUEUE'
}
withLabel: LargeInstance {
queue = '$LARGE_INSTANCE_QUEUE'
}
withLabel: NvmeLargeInstance {
queue = '$NVME_LARGE_INSTANCE_QUEUE'
}
withLabel: HighPriority {
queue = '$ON_DEMAND_QUEUE'
}
withLabel: OnDemand {
queue = '$ON_DEMAND_QUEUE'
}
withLabel: NvmeOnDemand {
queue = '$NVME_ON_DEMAND_QUEUE'
}
and then the launching command:
NF_RUN_COMMAND="nextflow run PROJECT/$NF_PIPELINE -c $GO_AWS_INF_CONFIG_FILE $CUSTOM_CONFIG $PARAM_FILE $RESUME $ENTRY --project_folder=$NF_PROJECT_DIR -name ${RUN_NAME,,} "
I
Inquisitive Reindeer
I found a compromise solution for the moment that works from Tower and CLI.... One can place a local copy of nxf_custom.config on the pipeline repository that points to s3 config files.
params.custom_config_base = "s3://my-nextflow"
profiles {
batch_nvme { includeConfig "${params.custom_config_base}/conf/batch_nvme.config" }
useast2 { includeConfig "${params.custom_config_base}/conf/useast2.config" }
sandbox { includeConfig "${params.custom_config_base}/conf/sandbox.config" }
}
nextflow.config pulls that local file with the usual
// Load custom profiles from s3
includeConfig "nxf_custom.config"
That reduces the redundant files to just one, instead of having to duplicate all the configs in all the pipeline repos.
One could even make nxf_custom.config public if one removes sensitive information.
Drew DiPalma
Merged in a post:
Allow loading of configuration files from s3
I
Inquisitive Reindeer
Currently the Nextflow CLI allows launching pipelines with configuration files located in s3: which comes very handy to allow consolidated config files from a private repo. However in the Sequera platform its not possible to add a pipeline that has configuration on s3 even though credentials for the compute environment and data explorer give access to the bucket. It would be nice to be able to do that or to be able to upload config files to the Sequera platform
M
Mighty Bird
Not opposed to your suggestion, since the consolidation of config files is also on my wish list (
tw launch -c config pipeline
does not consider previous configuration provided with tw pipeline add -c config pipeline
), yet using profiles in conjunction with includeConfig should solve most of your issues with config duplication:profiles {
plugins { includeConfig "${params.custom_config_base}/conf/plugins.config" }
plugins_new { includeConfig "${params.custom_config_base}/conf/plugins_new.config" }
}
When launching the pipeline, select all applicable profiles from the list to add or remove bits of config.
That is at least how I am at the moment envisioning implementing our layered configs.
C
Charcoal Mandrill
Mighty Bird Thanks, we are already using profiles to control configs for different compute environments (Singularity / Docker, HPC, cloud, etc). This does not really help to consolidate the number of duplicate code entries across the codebase for all our pipelines, since the profiles themselves need to be duplicated in every pipeline repo.
M
Mighty Bird
Charcoal Mandrill This strikes me as odd. All you need in every pipeline repo is a nextflow.config with a single
includeConfig
statement pointing to the centralized profile config. In there, you can then define the profiles to combine modular configs freely as needed.For reference, see the nf-core configs. The
nextflow.config
points to nfcore_custom.config
, which in turn defines profiles for each HPC or cloud profile.C
Charcoal Mandrill
Mighty Bird thanks for the link, that is essentially the behavior I am trying to emulate, I was not aware that nf-core had already implemented a version of it. However it does not really work for us because we are running in Nextflow Tower / Seqera Platform using AWS Batch. I have discovered (in a separate ticket) that while you can use the
includeConfig
nextflow.config directive to load config files from https GitHub links with embedded token, and from S3 bucket URI's, the latter which we would need for this does not seem to be supported in Tower, and we are not using a remote git repo that would be acceptable for use in this way either. Ultimately, this kinda of behaviour I think is gonna need to be baked directly into Nextflow or Tower/Platform instead of using the third-party nf-core methods, especially since Tower/Platform is gonna be the final prod deployment location. It seems like this kinda of central configs management is not yet a part of Tower. from what I can tell in these docs also the nf-core method seems to assume that the config files are located on disk in the execution environment of the Nextflow pipeline, which is not gonna be the case
another issue is as I described, that using the '-c' and '-C' args doesnt seem to work with https links or S3 bucket files either, despite
includeConfig
working with it, which a lesser but still important limitation.I
Inquisitive Reindeer
Charcoal Mandrill I just opened a ticket for this same issue. "Allow loading of configuration files from s3" as I would like to have a centralized config deployed. Indeed this worked beautifully for me using CLI but crashed and burned when I tried to use it from Tower.
In my mind this behavior seems a bit silly as Tower has credentials for all of our resources: gitlab, AWS batch, data explorer. So it already has access to a private bucket.
Another solution would be to allow a pull from the config repo prior to running the pipeline, or as you propose being able to store the config files in Tower itself.
C
Charcoal Mandrill
Inquisitive Reindeer can you link to the feature request you just filed? We should cross-link them. Thanks.
C
Charcoal Mandrill
looks like its this one: https://feedback.seqera.io/feature-requests/p/allow-loading-of-configuration-files-from-s3