I'm sorry, but I still don't get it.

Isn't --nodes=2,4 telling slurm to allocate 2 OR 4 nodes and nothing else?


So, if:


--nodes=2 allocates only two nodes

--nodes=4 allocates only four nodes

--nodes=1-2 allocates min one and max two nodes

--nodes=1-4 allocates min one and max four nodes


what is the allocation rule for --nodes=2,4 which is the so-called size_string 
allocation?


man sbatch says:


Node count can also be specified as size_string. The size_string specification 
identifies what nodes

values should be used. Multiple values may be specified using a comma separated 
list or with a step

function by suffix containing a colon and number values with a "-" separator.

For example, "--nodes=1-15:4" is equivalent to "--nodes=1,5,9,13".

...

The job will be allocated as many nodes as possible within the range specified 
and without delaying the

initiation of the job.

________________________________
From: Brian Andrus via slurm-users <slurm-users@lists.schedmd.com>
Sent: Thursday, August 29, 2024 7:27:44 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=<size_string>


It looks to me that you requested 3 tasks spread across 2 to 4 nodes. Realize 
--nodes is not targeting your nodes named 2 and 4, it is a count of how many 
nodes to use. You only needed 3 tasks/cpus, so that is what you were allocated 
and you have 1 cpu per node, so you get 3 (of up to 4) nodes. Slurm does not 
give you 4 nodes because you only want 3 tasks.

You see the result in your variables:

SLURM_NNODES=3
SLURM_JOB_CPUS_PER_NODE=1(x3)



If you only want 2 nodes, make --nodes=2.

Brian Andrus

On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:

Hi,


On sbatch's manpage there is this example for <size_string>:


--nodes=1,5,9,13


so either one specifies <minnodes>[-maxnodes] OR <size_string>.


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG

________________________________
From: Brian Andrus via slurm-users 
<slurm-users@lists.schedmd.com><mailto:slurm-users@lists.schedmd.com>
Sent: Thursday, August 29, 2024 4:11:25 PM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: playing with --nodes=<size_string>


Your --nodes line is incorrect:

-N, --nodes=<minnodes>[-maxnodes]|<size_string>
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.







-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to