For those who are interested:
I have found the problem and will submit a patch. If we find a partition
were a job can run but all nodes are busy. Save this state and return
this when all partitions are checked and job can not run in any.
Do not know if this is the right approach
Hi Prentice,
Ansers inline
Rather than specifying the processor types as GRES, I would
recommending defining them as features of the nodes and let the users
specify the features as constraints to their jobs. Since the newer
processors are backwards compatible with the older processors, list
the older processors as features of the newer nodes, too.
We already do this with features on our other cluster. We assign nodes
different feature and user select these. I can add a new feature of
which cpu type it is. Sometime you want avx512 and specific processor.
On other cluster we have 5 different GPU's and a lot of partitions. I
want to make it simple for our users. So we have a 'job_submit.lua'
script that submits to multiple parttions and if the user specify the
GRES type then slurm selects the right partition(s)
On this cluster we do not have GPU's but i can test with other GRES type
'cpu_type'. And I think the last partition in the list determines the
behavior. So if a use a GRES that is supported by the last partition
the job gets queued:
* srun -N1 --gres=cpu_type:e5_2650_v2 --pty /bin/bash
* srun --exclusive --gres=cpu_type:e5_2650_v2 --pty /bin/bash
srun: job 1865 queued and waiting for resources
So to me it seems that one of the partition is BUSY but can run the
job. I will test it on our GPU cluster but expect the same behaviour.
If you want to continue down the road you've already started on, can
you provide more information, like the partition definitions and the
gres definitions? In general, Slurm should support submitting to
multiple partitions.
```PartitionName=cpu_e5_2650_v1 DefMemPerCPU=11000 Default=No
DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00
Nodes=r16n[18-20] OverSubscribe=EXCLUSIVE QOS=normal State=UP
PartitionName=cpu_e5_2650_v2 DefMemPerCPU=11000 Default=No
DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00
Nodes=r16n[21-22] OverSubscribe=EXCLUSIVE QOS=normal State=UP
NodeName=r16n18 CoresPerSocket=8 Features=sandybridge,sse4,avx
Gres=cpu_type:e5_2650_v1:no_consume:4T MemSpecLimit=1024 RealMemory=188000 Sockets=2
State=UNKNOWN ThreadsPerCore=1 Weight=10
NodeName=r16n21 CoresPerSocket=8 Features=sandybridge,sse4,avx
Gres=cpu_type:e5_2650_v2:no_consume:4T MemSpecLimit=1024 RealMemory=188000 Sockets=2
State=UNKNOWN ThreadsPerCore=1 Weight=10
NodeName=r16n[18-20] Count=4T Flags=CountOnly Name=cpu_type
Type=e5_2650_v1 NodeName=r16n[21-22] Count=4T Flags=CountOnly
Name=cpu_type Type=e5_2650_v2
On this cluster I have version 20.02.6 installed. We have different
partitions for cpu type and gpu types. we want to make it easy for
the user who not care where there job runs and for the experienced
user they can specify the gres type: cpu_type or gpu
I have defined 2 cpu partitions:
* cpu_e5_2650_v1
* cpu_e5_2650_v2
and 2 gres cpu_type:
* e5_2650_v1
* e5_2650_v2
When no partitions are specified it will submit to both partitions:
* srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash -->
r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
Now I submit at the same time another job:
* srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash
This fails with: `srun: error: Unable to allocate resources:
Requested node configuration is not available`
I would expect it gets queued in the partition `cpu_e5_2650_v1`.
When I specify the partition on the command line:
* srun --exclusive -p cpu_e5_2650_v1_shared
--gres=cpu_type:e5_2650_v1 --pty /bin/bash
srun: job 1856 queued and waiting for resources
So the question is can slurm handle submitting to multiple
partitions when we specify gres attributes?
