[slurm-users] unable to run on all the logical cores

David Bellot Wed, 07 Oct 2020 22:16:25 -0700

Hi,

my Slurm cluster has a dozen machines configured as follows:


NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20
ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN

and scheduling is:

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core

My problem is that only half of the logical cores are used when I run a
computation.

Let me explain: I use R and the package 'batchtools' to create jobs. All
the jobs are created under the hood with sbatch. If I log in to all the
machines in my cluster and do a 'htop', I can see that only half of the
logical cores are used. Other methods to measure the load of each machine
confirmed this "visual" clue.
My jobs ask Slurm for only one cpu per task. I tried to enforce that with
the -c 1 but it didn't make any difference.

Then I realized there was something strange:
when I do scontrol show job <jobid>, I can spot the following output:

   NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:2 CoreSpec=*

that is each job uses NumCPUs=2 instead of 1. Also, I'm not sure why
TRES=cpu=2

Any idea on how to solve this problem and have 100% of the logical cores
allocated?

Best regards,
David

[slurm-users] unable to run on all the logical cores

Reply via email to