Separate workflow metadata from process work dirs in workDir
acknowledged
A
Additional Barnacle
To easily automate cleanup of (large) process work directories without losing important tower metadata files, like the nf-.. files used for Reports, it would be good to be able to store them in a different path (e.g. a different prefix on S3). Right now they are both directly in 'workDir', making it impossible to use simple AWS lifecycle rules to cleanup process work folders without also deleting the "nf-*" files and breaking the Report links from the tower front-end
Rob Newman
acknowledged
Rob Newman
Merged in a post:
Rendering of reports not being dependent on a .tsv file
L
Limited Mongoose
The rendering of Pipeline Run reports in the Seqera Platform is dependent on the file
zz://bucket/scratch/<workflow_id>/nf-<workflow_id>-reports.tsv
. We've found this can create a headache when trying to retain this file but clear up the rest of the scratch directory to save money.It would be handy if this file could be output to the output
dir/ data_path
, and Seqera Platform could use it to render reports from there. It seems unintuitive that any files that might need to be kept, would be written to the scratch
directory, which I feel would be considered impermanent.Ben Sherman
The way I would like to deal with this is to move the helper files (or at least the relevant information within) into the task metadata cache (i.e. the .nextflow folder) when a run completes. Then it should be safe to delete the task directories, and whatever information the Platform needs, it should be able to query the task metadata cache for it.
A
Additional Barnacle
Ben Sherman I don't follow exactly. I was thinking for example of the nf-*-report.tsv files that control the links on tower under the 'Reports' tab to result files in "publishDir". It was my understanding that tower looks for those at a fixed path (top-level of the compute environment workDir).
Are you referring to something different or am I misunderstanding?
Thanks
Felix
Ben Sherman
Additional Barnacle I see what you mean, you're talking about the report helper files. I guess I see it all as part of the same problem. The platform relies on these files in the run detail / task detail views, but the work directory is supposed to be temporary. The only difference is that the report helper files are not used by Nextflow, so instead of the Nextflow cache they should be saved into some database by the platform.
Drew DiPalma
N
Net Ox
There might be a simple fix for this. Write the
*report.tsv
files to the publish dir. However this still leaves the log files that require keeping. Having a way to keep all files required for correct rendering of the Seqera platform UI would be a crucial way to be able to reduce storage costs. I know that with Fusion you can use tags to differentially delete non-metadata files but I notice that the *report.tsv
files are not tagged.Mattia
Merged in a post:
The "Reports" tab in Seqera platform stops working after workdir cleanup
M
Medical Grouse
Our work directories (scratch space) can often be many TB in size for each NextFlow run, so we clean it up aggressively. Once it's cleaned up, it seems like the reports tab in the Tower page stops working, even if the published data is still present.
Mattia
Hi! I believe the issue is that the Seqera platform fetches a
*reports.tsv
file from the workDir to get the reports list every time you load the Run Details page. Cleaning up the workDir deletes that file and the platform loses the context -> cannot find any report.
I am merging this request with this, because they seem to be the same problem and, in this way, there will be a centralized follow-up point.
Y
Yellow sunshine Firefly
To add to this, if the report file is "Published" to another S3 bucket, the report link should point to the published location.
I
Indigo Wildcat
This would be great and make the reports functionality more useful