[slurm-users] Re: Can SLURM queue different jobs to start concurrently?

Lloyd Brown via slurm-users Mon, 08 Jul 2024 13:33:16 -0700

I'm confused. Why can't they just use a multi-node job, and have thejob script farm out the individual tasks to the various workers throughsome mechanism (srun, mpirun, ssh, etc.)? AFAIK, there's nothingpreventing a job from using resources on multiple hosts. The job justneeds to have some way of pushing the work out to those hosts.


Lloyd



On 7/8/24 14:17, Dan Healy via slurm-users wrote:

Hi there,
I've received a question from an end user, which I presume the answeris "No", but would like to ask the community first.
Scenario: The user wants to create a series of jobs that all need tostart at the same time. Example: there are 10 different executableapplications which have varying CPU and RAM constraints, all of whichneed to communicate via TCP/IP. Of course the user could design sometype of idle/statusing mechanism to wait until all jobs are /randomly/started, then begin execution, but this feels like a waste ofresources. The complete execution of these 10 applications would beconsidered a single simulation. The goal would be to distribute these10 applications across the cluster and not necessarily require themall to execute on a single node.
Is there a good architecture for this using SLURM? If so, pleasekindly point me in the right direction.
--
Thanks,

Daniel Healy


--
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Can SLURM queue different jobs to start concurrently?

Reply via email to