I want to run 50 sequential jobs (essentially 50 copies of the same code
with different input parameters) on a particular node. However, as soon as
one of the jobs gets executed, the other 49 jobs get killed immediately
with exit code 9.  The jobs are not interacting and are strictly parallel.
However, if the 50 jobs run on 50 different nodes, it runs successfully.
Can anyone please help with possible fixes?
I see a discussion almost along the similar lines in
https://groups.google.com/g/slurm-users/c/I1T6GWcLjt4
But could not get the final solution.

-- 
Arko Roy
Assistant Professor
School of Physical Sciences
Indian Institute of Technology Mandi
Kamand, Mandi
Himachal Pradesh - 175 005, India
Email: a...@iitmandi.ac.in
Web: https://faculty.iitmandi.ac.in/~arko/
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to