Dear SLURM experts,

I'm having trouble understanding an issue we have with slurm 17.11.10.

In one partition "all", we have some nodes with hypterthreading and
some without, leading to 56 and 28 "cores", respectively.

In the same partition, we have some nodes with 256GM and some with
128GB RAM.  All hypterthreading nodes have 256GB, and some
non-hyperthreading nodes also have 256GB; All 128GB nodes have no
hypterthreading.

Now, when I submit a job array with --ntasks=1 --mem=200G, the all the array's jobs have the MinCPUsNode set to 46, which is roughly 200/256 * 56. This leads to the array effectively being limited to that part of the partition with hypterthreading, which is obviously not what I
want.  I don't want MinCPUsNode to be set at all, after all I'm
specifying --ntasks=1.

Is this a bug?  Or am I doing something utterly wrong here?

Cheers,
Andreas




JobId=270402 ArrayJobId=270402 ArrayTaskId=18 JobName=calc_vcd_ts.py
 UserId=hilboll(1059) GroupId=hilboll(1059) MCS_label=N/A
 Priority=2909 Nice=0 Account=root QOS=normal
 JobState=PENDING Reason=Resources Dependency=(null)
 Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
 RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
 SubmitTime=2019-03-14T10:49:11 EligibleTime=2019-03-14T10:49:12
 StartTime=2019-03-15T10:49:12 EndTime=2019-03-16T10:49:12
Deadline=N/A
 PreemptTime=None SuspendTime=None SecsPreSuspend=0
 LastSchedEval=2019-03-14T11:04:18
 Partition=all AllocNode:Sid=login1:20705
 ReqNodeList=(null) ExcNodeList=(null)
 NodeList=(null) SchedNodeList=node07
 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
 TRES=cpu=1,mem=200G,node=1
 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
 MinCPUsNode=46 MinMemoryNode=200G MinTmpDiskNode=0
 Features=(null) DelayBoot=00:00:00
 Gres=(null) Reservation=(null)
 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
 Command=/home/hilboll/prj/2018_chochocanada/calc_vcd_ts.py
 WorkDir=/home/hilboll/prj/2018_chochocanada
 StdErr=/home/hilboll/prj/2018_chochocanada/slurm-270402_4294967294.out
 StdIn=/dev/null
 StdOut=/home/hilboll/prj/2018_chochocanada/slurm-270402_4294967294.out
 Power=

Reply via email to