Re: [slurm-users] NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ???

Jeffrey Frey Fri, 08 Feb 2019 07:48:08 -0800

Documentation for CR_CPU:

CR_CPU
CPUs are consumable resources. Configure the number of CPUs on each node, which 
may be equal to the count of cores or hyper-threads on the node depending upon 
the desired minimum resource allocation. The node's Boards, Sockets, 
CoresPerSocket andThreadsPerCore may optionally be configured and result in job 
allocations which have improved locality; however doing so will prevent more 
than one job being from being allocated on each core.


So once you're configured node(s) with ThreadsPerCore=N, the cons_res plugin 
still forces tasks to span all threads on a core.  Elsewhere in the 
documentation it is stated:


Note that the Slurm can allocate resources to jobs down to the resolution of a 
core.


So you MUST treat a thread as a core if you want to schedule individual 
threads.  I can confirm this using the config:


SelectTypeParameters = CR_CPU_MEMORY
NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2


Submitting a 1-cpu job, if I check the cpuset assigned to a job on n003:


$ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
4,12


If I instead configure as:


SelectTypeParameters = CR_Core_Memory
NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1


Slurm will schedule "cores" 0-15 to jobs, which the cpuset cgroup happily 
accepts.  A 1-cpu job then shows:


$ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
2


and a 2-cpu job shows:


$ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus
4,12






> On Feb 8, 2019, at 5:09 AM, Antony Cleave <antony.cle...@gmail.com> wrote:
> 
> if you want slurm to just ignore the difference between physical and logical 
> cores then you can change 
> SelectTypeParameters=CR_Core
> to
> SelectTypeParameters=CR_CPU
> 
> and then it will treat threads as CPUs and then it will let you start the 
> number of tasks you expect
> 
> Antony
> 
> On Thu, 7 Feb 2019 at 18:04, Jeffrey Frey <f...@udel.edu> wrote:
> Your nodes are hyperthreaded (ThreadsPerCore=2).  Slurm always allocates _all 
> threads_ associated with a selected core to jobs.  So you're being assigned 
> both threads on core N.
> 
> 
> On our development-partition nodes we configure the threads as cores, e.g.
> 
> 
> NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 
> ThreadsPerCore=1
> 
> 
> to force Slurm to schedule the threads separately.
> 
> 
> 
>> On Feb 7, 2019, at 12:10 PM, Xiang Gao <qasdfgtyu...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> We configured slurm on a server with 8 GPU and 16 CPUs and want to use slurm 
>> to scheduler for both CPU and GPU jobs. We observed an unexpected behavior 
>> that, although there are 16 CPUs, slurm only schedule 8 jobs to run even if 
>> there are jobs not asking any GPU. If I inspect detailed information using 
>> `scontrol show job`, I see some strange thing on some job that just ask for 
>> 1 CPU:
>> 
>> NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1
>> 
>> If I understand these concepts correctly, as the number of nodes is 1, 
>> number of tasks is 1, and number of cpus/task is 1, in principle there is no 
>> way that the final number of CPUs is 2. I'm not sure if I misunderstand the 
>> concepts, configure slurm wrongly, or this is a bug. So I come for help.
>> 
>> Some related config are:
>> 
>> # COMPUTE NODES  
>> NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 
>> ThreadsPerCore=2 RealMemory=120000 
>> Gres=gpu:gtx1080ti:2,gpu:titanv:3,gpu:v100:1,gpu:gp100:2
>> State=UNKNOWN  
>> PartitionName=queue Nodes=moria Default=YES MaxTime=INFINITE State=UP
>> 
>> # SCHEDULING  
>> FastSchedule=1 
>> SchedulerType=sched/backfill 
>> GresTypes=gpu 
>> SelectType=select/cons_res 
>> SelectTypeParameters=CR_Core
>> 
>> Best,
>> Xiang Gao
> 
> 
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> Jeffrey T. Frey, Ph.D.
> Systems Programmer V / HPC Management
> Network & Systems Services / College of Engineering
> University of Delaware, Newark DE  19716
> Office: (302) 831-6034  Mobile: (302) 419-4976
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
> 
> 
> 
> 


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::

Re: [slurm-users] NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ???

Reply via email to