Are you talking about a script that is run via sbatch containing srun
command lines? If so, there are a lot of reasons to do that. One is
better instrumentation, as I understand it, but also srun --mpi is a way
to eliminate mpiexec/mpirun/etc., and is what we recommend at our site
instead (using the PMI2 or PMIx methods).
On 7/15/22 05:17, Kamil Wilczek wrote:
Dear Slurm Users,
one of my cluster users would like to run a Ray cluster on Slurm.
I noticed that the batch script example requires running the "srun"
command on a compute node, which already is allocated:
https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template
This is the first time I see or hear about this type of usage
and I have problems wrapping my head around this.
Is there anything wrong or unusual about this? I understand that
this would allocate some resources on other nodes. Would
Slurm enforce limits properly ("qos" or "partition" limits)?
Kind Regards
--
#BlackLivesMatter
____
|| \\UTGERS, |----------------------*O*------------------------
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark
`'