[slurm-dev] Re: Tasks distribution

Jeffrey T Frey Mon, 23 Oct 2017 11:57:02 -0700

The deeper I dig at the select/cons_res plugin, the more of a mess it appears 
to be.  Inconsistencies with the documentations, etc.



The primary issue seems to be with the select/cons_res node selection lacking 
"--ntasks-per-node" et al.  By default, the algorithm selects "--nodes=N" 
nodes, then packs "--ntasks=n" onto the nodes starting at the first selected 
one.  The algorithm ensures that at least 1 task will be on every node.  The 
packing is naturally influenced by how many cores are unused on each of the 
selected nodes.


This leads to the "--distribution=plane=X" option being useless.  If I ask for:


--nodes=2 --ntasks=8 --distribution=plane=3


the resulting allocation is


SLURM_NNODES=2
SLURM_DIST_PLANESIZE=3
SLURM_NTASKS=8
SLURM_TASKS_PER_NODE=7,1 


which isn't remotely what the "plane" option claims to do.  Doing "cyclic" or 
"block" yields exactly the same behavior.  So the default behavior leaves all 
distribution choices (in terms of node and core selection) indistinguishable:


[frey@login ~]$ sbatch --nodes=2 --ntasks=8 --distribution=cyclic test.sh
Submitted batch job 558
[frey@login ~]$ sbatch --nodes=2 --ntasks=8 --distribution=block test.sh
Submitted batch job 559
[frey@login ~]$ sbatch --nodes=2 --ntasks=8 --distribution=plane=3 test.sh
Submitted batch job 560

[frey@login ~]$ grep SLURM_TASKS_PER_NODE slurm-5* 
slurm-558.out:SLURM_TASKS_PER_NODE=7,1
slurm-559.out:SLURM_TASKS_PER_NODE=7,1
slurm-560.out:SLURM_TASKS_PER_NODE=7,1


In poking through the source code, though, the "SPREAD_JOB" option leads to the 
use of an alternate algorithm more in line with your and my expectations.  The 
sbatch man page isn't 100% clear what the "--spread-job" option will do (sounds 
like it will spread the job across the whole partition of nodes) but it turns 
out to honor the "--nodes=N" that was specified.  So submitting using


--nodes=2 --ntasks=8 --distribution=plane=3 --spread-job


yields the asymmetric task distribution the "plane=3" option _should_ create 
under the task and node count in question:


SLURM_NNODES=2
SLURM_JOBID=557
SLURM_DIST_PLANESIZE=3
SLURM_NTASKS=8
SLURM_TASKS_PER_NODE=5,3


Likewise, for the "cyclic" and "block" distribution options, including 
"--spread-job" option yields SLURM_TASKS_PER_NODE of 4(x2).



> On Oct 17, 2017, at 02:49 , sysadmin.caos <sysadmin.c...@uab.cat> wrote:
> 
> If I run with "--ntasks-per-node=6", result is:
> Process 0 on clus01.hpc.local out of 12
> Process 1 on clus02.hpc.local out of 12
> Process 2 on clus01.hpc.local out of 12
> Process 3 on clus02.hpc.local out of 12
> Process 4 on clus01.hpc.local out of 12
> Process 5 on clus02.hpc.local out of 12
> Process 6 on clus01.hpc.local out of 12
> Process 7 on clus02.hpc.local out of 12
> Process 8 on clus01.hpc.local out of 12
> Process 9 on clus02.hpc.local out of 12
> Process 10 on clus01.hpc.local out of 12
> Process 11 on clus02.hpc.local out of 12
> so it's correct... but... you could suppose you don't know how many cores 
> each node has, so maybe, cluster nodes have 24 cores. Then, do you must 
> explicitly divide number of tasks between number of nodes inside your script 
> for assigning the correct value to "--ntasks-per-node" in the "srun" command? 
> Is not there an automatic way for allocating in a cyclic distribution? 
> 
> Thanks.
> 
> El 16/10/2017 a las 16:11, Jeffrey T Frey escribió:
>>> If, now, I submit with "sbatch --distribution=cyclic -N 2 -n 12 
>>> ./test-new.sh", what I get is:
>>> Process 0 on clus01.hpc.local out of 12
>>> Process 1 on clus02.hpc.local out of 12
>>> Process 2 on clus01.hpc.local out of 12
>>> Process 3 on clus01.hpc.local out of 12
>>> Process 4 on clus01.hpc.local out of 12
>>> Process 5 on clus01.hpc.local out of 12
>>> Process 6 on clus01.hpc.local out of 12
>>> Process 7 on clus01.hpc.local out of 12
>>> Process 8 on clus01.hpc.local out of 12
>>> Process 9 on clus01.hpc.local out of 12
>>> Process 10 on clus01.hpc.local out of 12
>>> Process 11 on clus01.hpc.local out of 12
>>> ...but I was expecting another result... Something like this:
>>> Process 0 on clus01.hpc.local out of 12
>>> Process 1 on clus02.hpc.local out of 12
>>> Process 2 on clus01.hpc.local out of 12
>>> Process 3 on clus02.hpc.local out of 12
>>> Process 4 on clus01.hpc.local out of 12
>>> Process 5 on clus02.hpc.local out of 12
>>> Process 6 on clus01.hpc.local out of 12
>>> Process 7 on clus02.hpc.local out of 12
>>> Process 8 on clus01.hpc.local out of 12
>>> Process 9 on clus02.hpc.local out of 12
>>> Process 10 on clus01.hpc.local out of 12
>>> Process 11 on clus02.hpc.local out of 12
>>> because I'm forcing a cyclic distribution. Where is the problem?
>>> 
>> 
>> Have you tried this same thing but with tasks-per-node specified -- be as 
>> explicit as possible about how many you want placed on each node?  E.g.
>> 
>> 
>>      sbatch -N 2 -n 12 --ntasks-per-node=6 --distribution=cyclic 
>> ./test-new.sh
>> 
>

[slurm-dev] Re: Tasks distribution

Reply via email to