I have some users that are using ray on slurm.
I will preface by saying we are new slurm users, so may not be doing everything
exactly correct.
The only issue that we came across so far as something that was somewhat ray
specific that we ran into.
Specifically, and pardon my lack of specificity,
Are you talking about a script that is run via sbatch containing srun
command lines? If so, there are a lot of reasons to do that. One is
better instrumentation, as I understand it, but also srun --mpi is a way
to eliminate mpiexec/mpirun/etc., and is what we recommend at our site
instead (usin
Dear Slurm Users,
one of my cluster users would like to run a Ray cluster on Slurm.
I noticed that the batch script example requires running the "srun"
command on a compute node, which already is allocated:
https://docs.ray.io/en/latest/cluster/examples/slurm-template.html#slurm-template
This is