Preserving the user identity for HPC job submission
complete
Mattia
The existing SSH credentials model poses challenges because all jobs will be using the same credentials to connect to the cluster, lacking the ability to preserve individual user identities upon HPC job submission.
There is a need for a solution to preserve the user identity on the HPC side, when submitting jobs from the Platform.
This functionality is crucial for system accounting, chargeback, and fair resource usage.
Mattia
complete
The feature to preserve the user idtntity for HPC job submission is now available for Seqera Platform 24.1.0, under the name "Managed identities".
For an in-depth description and usage instructions, please visit: https://docs.seqera.io/platform/24.1.0/credentials/managed_identities
Rob Newman
This will be released in the v.24.1 of Seqera Platform, targeting end of June 2024.
Rob Newman
Merged in a post:
How to detect the original user of a file created by Seqera Platform?
A
Alive Panda
Now that more pipelines are being added to our Seqera Platform installation, our data steward has realized a problem. The user is clearly identified by Seqera Platform when launching a pipeline. Still, when pipelines produce results in the final published directories, everything is created by the service account defined in the credentials and assigned to one compute environment. The data steward cannot know who the owner of those results is. Do you have a solution/best practice for this situation?
Mattia
in progress
Rob Newman
Merged in a post:
Enable workflow submission with user-level credentials
Mattia
Organizations often require the ability to submit jobs using user-specific credentials. This is particularly important when organizations have established permission structures for accessing compute resources and data sources.
The current implementation for cloud compute environments uses shared service account credentials for job execution. These shared credentials are utilized for launching all jobs and accessing input/output data. However, this approach becomes problematic when only specific organization members are authorized to work with sensitive data. Using shared credentials in such scenarios could lead to accidental data sharing, violating data protection rules.
To address this, we aim to introduce user-level authentication at the workflow run level. This enhancement will allow each member to execute pipelines using their own user-level credentials, ensuring compliance with all access restriction policies managed at the cloud level (e.g., IAM).
Specifically, the intended outcome includes:
- Enabling the creation of user-level credentials for administrators to assign each member the appropriate identity in line with company policy for cloud compute environments.
- Allowing administrators within each workspace to set the authentication behavior for newly created compute environments, which can include:
- Using the service account credentials for all users (current behavior).
- Enforcing user-level credentials (fails if a user lacks suitable user-level credentials for the compute environment).
- When a user launches a pipeline on a compute environment that requires user-level credentials, all jobs will be launched using those credentials.
- Ensuring access to workflow logs and results also requires user-level credentials when a user attempts to access a workflow executed on a compute environment that mandates such credentials. This is essential for compliance with data privacy policies.
C
Continuous Squid
I strongly support this feature. We cannot use a shared workspace without having user-specific credential. How I see it is 1) user has one or more credentials, 2) computing environment is not bound to a specific credential, 3) when launching a pipeline, the user choose which crendential to use.
M
Mighty Bird
Interesting proposal.
For now, on our HPC every user is running their own agent instance. (also for redundancy, so that not all pipeline runs are inadvertently stopped by one terminated screen/tmux session.)
But how would you envision to link the HPC identity with the Platform user? I believe the whole point of developing the agent was that the web app should not store SSH keys etc. for access to the HPC.
Mattia
under review
C
Continuous Squid
Mighty Bird I think the problem is similar as the one for the feature request https://seqera.canny.io/feature-requests/p/enable-workflow-submission-with-user-level-credentials : each pipeline is associated to a single credential, therefore every users who want to use a shared pipeline in a workspace need to use the same credential (in this case a common agent.
As far as I can see, the agent solution only works for personal dashboard.
I would be really happy to see any other solution for this problem.