[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

Davide DelVento via slurm-users Tue, 01 Apr 2025 08:27:05 -0700

Yes, I think so, but that should be no problem. I think that requires your
Slurm was built using the --enable-multiple-slurmd configure option, so you
might need to rebuild Slurm, if you didn't use that option in the first
place.


On Mon, Mar 31, 2025 at 7:32 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi Davide
> Thanks for your feedback
>
> If  gpu01 and cpusingpu01 are physically the same node, doesn't this mean
> that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and
> one with "slurmd -N cpusingpu01") ?
>
>
> Thanks, Massimo
>
>
> On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento <davide.quan...@gmail.com>
> wrote:
>
>> Ciao Massimo,
>> How about creating another queue cpus_in_the_gpu_nodes (or something less
>> silly) which targets the GPU nodes but does not allow the allocation of the
>> GPUs with gres and allocates 96-8 (or whatever other number you deem
>> appropriate) of the CPUs (and similarly with memory)? Actually it could
>> even be the same "onlycpus" queue, just on different nodes.
>>
>> In fact, in Slurm you declare the cores (and sockets) in a node-based,
>> not queue-based, fashion. But you can set up an alias for those nodes with
>> a second name and use such a second name in the way described above. I am
>> not aware (and I have not searched for) Slurm be able to understand such a
>> situation on its own and therefore you will have to manually avoid "double
>> booking". One way of doing that could be to configure the nodes with their
>> first name in a way that Slurm thinks they have less resources. So for
>> example in slurm.conf
>>
>> NodeName=gpu[01-06] CoresPerSocket=4 RealMemory=whatever1 Sockets=2
>> ThreadsPerCore=1 Weight=10000 State=UNKNOWN Gres=gpu:h100:4
>> NodeName=cpusingpu[01-06] CoresPerSocket=44 RealMemory=whatever2
>> Sockets=2 ThreadsPerCore=1 Weight=10000 State=UNKNOWN
>>
>> where gpuNN and cpusingpuNN are physically the same node and whatever1 +
>> whatever2 is the actual maximum amount of memory you want Slurm to
>> allocate. And you will also want to make sure the Weight are such that the
>> non-GPU nodes get used first.
>>
>> Disclaimer: I'm thinking out loud, I have not tested this in practice,
>> there may be something I overlooked.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>>> Dear all
>>>
>>>
>>>
>>> We have just installed a small SLURM cluster composed of 12 nodes:
>>>
>>> - 6 CPU only nodes: 2 Sockets=2, 96 CoresPerSocket 2, ThreadsPerCore=2,
>>> 1.5 TB of RAM
>>> - 6 nodes with also GPUS: same conf of the CPU-only node + 4 H100 per
>>> node
>>>
>>>
>>> We started with a setup with 2 partitions:
>>>
>>> - a 'onlycpus' partition which sees all the cpu-only nodes
>>> - a 'gpus' partition which sees the nodes with gpus
>>>
>>> and asked users to use the 'gpus' partition only for jobs that need gpus
>>> (for the time being we are not technically enforced that).
>>>
>>>
>>> The problem is that a job requiring a GPU usually needs only a few cores
>>> and only a few GB of RAM, which means wasting a lot of CPU cores.
>>> And having all nodes in the same partition would mean that there is the
>>> risk that a job requiring a GPU can't start if all CPU cores and/or all
>>> memory is used by CPU only jobs
>>>
>>>
>>> I went through the mailing list archive and I think that "splitting" a
>>> GPU node into two logical nodes (one to be used in the 'gpus' partition and
>>> one to be used in the 'onlycpus' partition) as discussed in [*] would help.
>>>
>>>
>>> Since that proposed solution is considered by his author a "bit of a
>>> kludge" and since I read that splitting a node into multiple logical nodes
>>> is in a general a bad idea, I'd like to understand if you could suggest
>>> other/best options.
>>>
>>>
>>> I also found this [**] thread, but I don't like too much that approach
>>> (i.e. relying on MaxCPUsPerNode) because it would mean having 3 partition
>>> (if I have got it right): two partitions for cpu only jobs and 1 partition
>>> for gpu jobs
>>>
>>>
>>> Many thanks, Massimo
>>>
>>>
>>> [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M
>>> [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>>
>>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

Reply via email to