Dear Arko, Arko Roy via slurm-users <slurm-users@lists.schedmd.com> writes:
> I want to run 50 sequential jobs (essentially 50 copies of the same code with > different input parameters) on a particular node. However, as soon as one of > the > jobs gets executed, the other 49 jobs get killed immediately with exit code > 9. The jobs are not interacting and are strictly parallel. However, if the > 50 jobs run on > 50 different nodes, it runs successfully. > Can anyone please help with possible fixes? > I see a discussion almost along the similar lines in > https://groups.google.com/g/slurm-users/c/I1T6GWcLjt4 > But could not get the final solution. If the jobs are independent, why do you want to run them all on the same node? If you do have problems when jobs run on the same node, there may be an issue with the jobs all trying to access a single resource, such as a file. However, you probably need to show your job script in order for anyone to be able to work out what is going on. Regards Loris > -- > Arko Roy > Assistant Professor > School of Physical Sciences > Indian Institute of Technology Mandi > Kamand, Mandi > Himachal Pradesh - 175 005, India > Email: a...@iitmandi.ac.in > Web: https://faculty.iitmandi.ac.in/~arko/ -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com