Hi Andreas,
many thanks for your reply and for the link to the definition page of
slurm documentation!
Regarding my example, I still have a question: why do you assume that
tasks=cpus?
From the definition of cpu in the documentation I understand that cpus
might refers to threads since I defined threads in the config file of my
cluster, but I cannot really get the reason of the association cpus=tasks.
Can you help me with this?
Many thanks in advance!
Best,
Miriam
On 04/12/24 18:28, Henkel, Andreas via slurm-users wrote:
Hi Miriam,
The Definition of cpu is “fluid” . It depends on hardware and
configuration. If threads are defined then cpu may relate to one
thread whereas on hardware configurations without threads it will
refer to a physical core. https://slurm.schedmd.com/mc_support.html#defs
Didn’t you set mintres to be cpu=33? Therefore a job asking for 12
tasks(=cpus) has to be rejected, doesn’t it?
Best,
Andreas
Am 04.12.2024 um 11:18 schrieb Miriam Olmi via slurm-users
<slurm-users@lists.schedmd.com>:
Hi all,
I cannot understand the true difference and definition of "core",
"task" and "cpu" within the limits associated to a partition via the
TRES variable of a QOS.
More precisely I have 2 partitions defined as follows:
PartitionName=lprod
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=lprod_part
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO
MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=r037c01s[01-12]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=432 TotalNodes=12 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=432,mem=12M,node=12,billing=432
PartitionName=bprod
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=bprod_part
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO
MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=r037c01s[01-12]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=432 TotalNodes=12 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=432,mem=12M,node=12,billing=432
with the two qos defined as:
Name Priority GraceTime PreemptMode UsageFactor
MaxJobsPU MaxSubmitPU MaxTRES
MinTRES Flags
-------------------- ---------- ---------- ----------- -----------
--------- ----------- -------------------- --------------------
--------------------------------------------------
lprod_part 0 00:00:00 cluster 1.000000
100 120 cpu=32,mem=366G cpu=1
DenyOnLimit,PartitionMaxNodes,PartitionMinNodes
bprod_part 0 00:00:00 cluster 1.000000
100 120 cpu=64,mem=366G cpu=33
DenyOnLimit,OverPartQOS
If I try to submit a job on the lprod partition with the directives:
#SBATCH --nodes=12
#SBATCH --ntasks-per-node=1
the job is executed correctly while it is not executed if submitted
on the bprod partition due to the error:
sbatch: error: QOSMinCpuNotSatisfied
sbatch: error: Batch job submission failed: Job violates
accounting/QOS policy (job submit limit, user's size and/or time limits)
I understand that this is related to the limit of cpu associated to
the partition via the qos: lprod->[1-32]cpus, bprod->[33-64]cpus
but I would like to have a more proper explanation since the options
I am using are not referring to "cpu" but to "tasks" and I could not
find
a proper definition of "cpu" in the framework of the TRES.
Many thanks in advance.
Best,
Miriam
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
--
*******************************************************
Miriam Olmi
Computing & Network Service
Laboratori Nazionali del Gran Sasso - INFN
Via G. Acitelli, 22
67100 Assergi (AQ) Italy
https://www.lngs.infn.it
✉ email:miriam.o...@lngs.infn.it
☎ office: +39 0862 437222
*******************************************************
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com