Hi Marcus,

More ideas:
CPUs doesn’t always count as core but may take the meaning of one thread, hence 
makes different 
Maybe the behavior of CR_ONE_TASK  is still not solid nor properly documente 
and ntasks and ntasks-per-node are honored different internally. If so solely 
using ntasks can mean using alle threads for Slurm even if the binding may be 
correct according to binding. 
Obviously in your results Slurm handles the options differently. 

Have you tried configuring the node with cpus=96? What output do you get from 
slurmd -C? 
Is this a new architecture like skylake? In case of subnuma-Layouts Slurm can 
not handle it without hwloc2. 
Have you tried to use srun -v(vv) instead of sbatch? Maybe you can get a 
glimpse of what Slurm actually does with your options.

Best,
Andreas 


> Am 14.02.2019 um 08:34 schrieb Marcus Wagner <wag...@itc.rwth-aachen.de>:
> 
> Hi Chris,
> 
> 
> this are 96 thread nodes with 48 cores. You are right, that if we set it to 
> 24, the job will get scheduled. But then, only half of the node is used. On 
> the other side, if I only use --ntasks=48, slurm schedules all tasks onto the 
> same node. The hyperthread of each core is included in the cgroup and the 
> task_affinity plugin also correctly binds the hyperthread together with the 
> core (small ugly testscript from us, the last two numbers are the core and 
> its hyperthread):
> 
> ncm0728.hpc.itc.rwth-aachen.de <0> OMP_STACKSIZE: <#> unlimited+p2 +pemap 0,48
> ncm0728.hpc.itc.rwth-aachen.de <10> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 26,74
> ncm0728.hpc.itc.rwth-aachen.de <11> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 29,77
> ncm0728.hpc.itc.rwth-aachen.de <12> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 6,54
> ncm0728.hpc.itc.rwth-aachen.de <13> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 9,57
> ncm0728.hpc.itc.rwth-aachen.de <14> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 30,78
> ncm0728.hpc.itc.rwth-aachen.de <15> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 33,81
> ncm0728.hpc.itc.rwth-aachen.de <16> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 7,55
> ncm0728.hpc.itc.rwth-aachen.de <17> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 10,58
> ncm0728.hpc.itc.rwth-aachen.de <18> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 31,79
> ncm0728.hpc.itc.rwth-aachen.de <19> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 34,82
> ncm0728.hpc.itc.rwth-aachen.de <1> OMP_STACKSIZE: <#> unlimited+p2 +pemap 3,51
> ncm0728.hpc.itc.rwth-aachen.de <20> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 8,56
> ncm0728.hpc.itc.rwth-aachen.de <21> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 11,59
> ncm0728.hpc.itc.rwth-aachen.de <22> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 32,80
> ncm0728.hpc.itc.rwth-aachen.de <23> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 35,83
> ncm0728.hpc.itc.rwth-aachen.de <24> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 12,60
> ncm0728.hpc.itc.rwth-aachen.de <25> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 15,63
> ncm0728.hpc.itc.rwth-aachen.de <26> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 36,84
> ncm0728.hpc.itc.rwth-aachen.de <27> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 39,87
> ncm0728.hpc.itc.rwth-aachen.de <28> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 13,61
> ncm0728.hpc.itc.rwth-aachen.de <29> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 16,64
> ncm0728.hpc.itc.rwth-aachen.de <2> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 24,72
> ncm0728.hpc.itc.rwth-aachen.de <30> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 37,85
> ncm0728.hpc.itc.rwth-aachen.de <31> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 40,88
> ncm0728.hpc.itc.rwth-aachen.de <32> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 14,62
> ncm0728.hpc.itc.rwth-aachen.de <33> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 17,65
> ncm0728.hpc.itc.rwth-aachen.de <34> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 38,86
> ncm0728.hpc.itc.rwth-aachen.de <35> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 41,89
> ncm0728.hpc.itc.rwth-aachen.de <36> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 18,66
> ncm0728.hpc.itc.rwth-aachen.de <37> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 21,69
> ncm0728.hpc.itc.rwth-aachen.de <38> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 42,90
> ncm0728.hpc.itc.rwth-aachen.de <39> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 45,93
> ncm0728.hpc.itc.rwth-aachen.de <3> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 27,75
> ncm0728.hpc.itc.rwth-aachen.de <40> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 19,67
> ncm0728.hpc.itc.rwth-aachen.de <41> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 22,70
> ncm0728.hpc.itc.rwth-aachen.de <42> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 43,91
> ncm0728.hpc.itc.rwth-aachen.de <43> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 46,94
> ncm0728.hpc.itc.rwth-aachen.de <44> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 20,68
> ncm0728.hpc.itc.rwth-aachen.de <45> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 23,71
> ncm0728.hpc.itc.rwth-aachen.de <46> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 44,92
> ncm0728.hpc.itc.rwth-aachen.de <47> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 47,95
> ncm0728.hpc.itc.rwth-aachen.de <4> OMP_STACKSIZE: <#> unlimited+p2 +pemap 1,49
> ncm0728.hpc.itc.rwth-aachen.de <5> OMP_STACKSIZE: <#> unlimited+p2 +pemap 4,52
> ncm0728.hpc.itc.rwth-aachen.de <6> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 25,73
> ncm0728.hpc.itc.rwth-aachen.de <7> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
> 28,76
> ncm0728.hpc.itc.rwth-aachen.de <8> OMP_STACKSIZE: <#> unlimited+p2 +pemap 2,50
> ncm0728.hpc.itc.rwth-aachen.de <9> OMP_STACKSIZE: <#> unlimited+p2 +pemap 5,53
> 
> 
> --ntasks=48:
> 
>    NodeList=ncm0728
>    BatchHost=ncm0728
>    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>    TRES=cpu=48,mem=182400M,node=1,billing=48
> 
> 
> --ntasks=48
> --ntasks-per-node=24:
> 
>    NodeList=ncm[0438-0439]
>    BatchHost=ncm0438
>    NumNodes=2 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>    TRES=cpu=48,mem=182400M,node=2,billing=48
> 
> 
> --ntasks=48
> --ntasks-per-node=48:
> 
> sbatch: error: CPU count per node can not be satisfied
> sbatch: error: Batch job submission failed: Requested node configuration is 
> not available
> 
> 
> Isn't the first essentially the same as the last, with the difference, that I 
> want to force slurm to put all tasks onto one node?
> 
> 
> 
> Best
> Marcus
> 
> 
>> On 2/14/19 7:15 AM, Chris Samuel wrote:
>>> On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote:
>>> 
>>> #SBATCH --ntasks-per-node=48
>> I wouldn't mind betting is that if you set that to 24 it will work, and each
>> thread will be assigned a single core with the 2 thread units on it.
>> 
>> All the best,
>> Chris
> 
> -- 
> Marcus Wagner, Dipl.-Inf.
> 
> IT Center
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wag...@itc.rwth-aachen.de
> www.itc.rwth-aachen.de
> 
> 

Reply via email to