On 26/8/24 8:40 am, Di Bernardini, Fabio via slurm-users wrote:

Hi everyone, for accounting reasons, I need to create only one job across two or more federated clusters with two or more srun steps.

The limitations for heterogenous jobs say:

https://slurm.schedmd.com/heterogeneous_jobs.html#limitations

> In a federation of clusters, a heterogeneous job will execute
> entirely on the cluster from which the job is submitted. The
> heterogeneous job will not be eligible to migrate between clusters
> or to have different components of the job execute on different
> clusters in the federation.

However, from your script it's not clear to me that's what you're meaning, because you include multiple --cluster options. I'm not sure if that works, as you mention the docs don't cover that case. They do say (however) that:

> If a heterogeneous job is submitted to run in multiple clusters not
> part of a federation (e.g. "sbatch --cluster=alpha,beta ...") then
> the entire job will be sent to the cluster expected to be able to
> start all components at the earliest time.

My gut instinct is that this isn't going to work, my feeling is that to launch a heterogenous job like this requires the slurmctld's on each cluster to coordinate and I'm not aware of that being possible currently.

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to