[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

Loris Bennett via slurm-users Mon, 10 Jun 2024 04:16:33 -0700

Hi George,

George Leaver via slurm-users <slurm-users@lists.schedmd.com> writes:


> Hello,
>
> Previously we were running 22.05.10 and could submit a "multinode" job
> using only the total number of cores to run, not the number of nodes.
> For example, in a cluster containing only 40-core nodes (no
> hyperthreading), Slurm would determine two nodes were needed with
> only:
> sbatch -p multinode -n 80 --wrap="...."
>
> Now in 23.02.1 this is no longer the case - we get:
> sbatch: error: Batch job submission failed: Node count specification invalid
>
> At least -N 2 is must be used (-n 80 can be added)
> sbatch -p multinode -N 2 -n 80 --wrap="...."
>
> The partition config was, and is, as follows (MinNodes=2 to reject
> small jobs submitted to this partition - we want at least two nodes
> requested)
> PartitionName=multinode State=UP Nodes=node[081-245]
> DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1
> DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE
> Default=NO

But do you really want to force a job to use two nodes if it could in
fact run on one?

What is the use-case for having separate 'uninode' and 'multinode'
partitions?  We have a university cluster with a very wide range of jobs
and essentially a single partition.  Allowing all job types to use one
partition means that the different resource requirements tend to
complement each other to some degree.  Doesn't splitting up your jobs
over two partitions mean that either one of the two partitions could be
full, while the other has idle nodes?

Cheers,

Loris

> All nodes are of the form
> NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 
> CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000
>
> slurm.conf has
> EnforcePartLimits       = ANY
> SelectType              = select/cons_tres
> TaskPlugin              = task/cgroup,task/affinity
>
> A few fields from: sacctmgr show qos multinode
> Name|Flags|MaxTRES
> multinode|DenyOnLimit|node=5
>
> The sbatch/srun man page states:
> -n, --ntasks .... If -N is not specified, the default behavior is to
> allocate enough nodes to satisfy the requested resources as expressed
> by per-job specification options, e.g. -n, -c and --gpus.
>
> I've had a look through release notes back to 22.05.10 but can't see anything 
> obvious (to me).
>
> Has this behaviour changed? Or, more likely, what have I missed ;-) ?
>
> Many thanks,
> George
>
> --
> George Leaver
> Research Infrastructure, IT Services, University of Manchester
> http://ri.itservices.manchester.ac.uk | @UoM_eResearch
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

Reply via email to