Re: [slurm-users] how can users start their worker daemons using srun?

Chris Samuel Tue, 28 Aug 2018 05:36:37 -0700

On Tuesday, 28 August 2018 10:21:45 AM AEST Chris Samuel wrote:

> That won't happen on a well configured Slurm system as it is Slurm's role to
> clear up any processes from that job left around once that job exits.


Sorry Reid, for some reason I misunderstood your email and the fact you were 
talking about job steps! :-(

One other option in this case is that you can say add 2 cores per node for the 
daemons to the overall job request and then do in your jobs

srun --ntasks-per-node=1 -c 2 ./foo.py &

and ensure that foo.py doesn't exit after the daemons launch (if you are using 
cgroups then those daemons should be contained within the job steps cgroup so 
you should be able to spot their PIDs easily enough).

That then gives you the rest of the cores to play with, so you would launch 
future job steps on n-2 cores per node (you could use the environment 
variables SLURM_CPUS_PER_TASK  & SLURM_NTASKS_PER_NODE to avoid having to hard 
code these for instance).

Of course at the end then your batch script would need to kill off that first 
job step.

Would that help?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

Re: [slurm-users] how can users start their worker daemons using srun?

Reply via email to