Hello,

Previously we were running 22.05.10 and could submit a "multinode" job using 
only the total number of cores to run, not the number of nodes.
For example, in a cluster containing only 40-core nodes (no hyperthreading), 
Slurm would determine two nodes were needed with only:
sbatch -p multinode -n 80 --wrap="...."

Now in 23.02.1 this is no longer the case - we get:
sbatch: error: Batch job submission failed: Node count specification invalid

At least -N 2 is must be used (-n 80 can be added)
sbatch -p multinode -N 2 -n 80 --wrap="...."

The partition config was, and is, as follows (MinNodes=2 to reject small jobs 
submitted to this partition - we want at least two nodes requested)
PartitionName=multinode State=UP Nodes=node[081-245] DefaultTime=168:00:00 
MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1 DefMemPerCPU=4096 MinNodes=2 
QOS=multinode Oversubscribe=EXCLUSIVE Default=NO

All nodes are of the form
NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 
CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000

slurm.conf has
EnforcePartLimits       = ANY
SelectType              = select/cons_tres
TaskPlugin              = task/cgroup,task/affinity

A few fields from: sacctmgr show qos multinode
Name|Flags|MaxTRES
multinode|DenyOnLimit|node=5

The sbatch/srun man page states:
-n, --ntasks   .... If -N is not specified, the default  behavior is to 
allocate enough nodes to satisfy the requested resources as expressed by 
per-job specification options, e.g. -n, -c and --gpus.

I've had a look through release notes back to 22.05.10 but can't see anything 
obvious (to me).

Has this behaviour changed? Or, more likely, what have I missed ;-) ?

Many thanks,
George

--
George Leaver
Research Infrastructure, IT Services, University of Manchester
http://ri.itservices.manchester.ac.uk | @UoM_eResearch

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to