On Tuesday, 28 August 2018 10:21:45 AM AEST Chris Samuel wrote: > That won't happen on a well configured Slurm system as it is Slurm's role to > clear up any processes from that job left around once that job exits.
Sorry Reid, for some reason I misunderstood your email and the fact you were talking about job steps! :-( One other option in this case is that you can say add 2 cores per node for the daemons to the overall job request and then do in your jobs srun --ntasks-per-node=1 -c 2 ./foo.py & and ensure that foo.py doesn't exit after the daemons launch (if you are using cgroups then those daemons should be contained within the job steps cgroup so you should be able to spot their PIDs easily enough). That then gives you the rest of the cores to play with, so you would launch future job steps on n-2 cores per node (you could use the environment variables SLURM_CPUS_PER_TASK & SLURM_NTASKS_PER_NODE to avoid having to hard code these for instance). Of course at the end then your batch script would need to kill off that first job step. Would that help? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC