Yes, I think so, but that should be no problem. I think that requires your Slurm was built using the --enable-multiple-slurmd configure option, so you might need to rebuild Slurm, if you didn't use that option in the first place.
On Mon, Mar 31, 2025 at 7:32 AM Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > Hi Davide > Thanks for your feedback > > If gpu01 and cpusingpu01 are physically the same node, doesn't this mean > that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and > one with "slurmd -N cpusingpu01") ? > > > Thanks, Massimo > > > On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento <davide.quan...@gmail.com> > wrote: > >> Ciao Massimo, >> How about creating another queue cpus_in_the_gpu_nodes (or something less >> silly) which targets the GPU nodes but does not allow the allocation of the >> GPUs with gres and allocates 96-8 (or whatever other number you deem >> appropriate) of the CPUs (and similarly with memory)? Actually it could >> even be the same "onlycpus" queue, just on different nodes. >> >> In fact, in Slurm you declare the cores (and sockets) in a node-based, >> not queue-based, fashion. But you can set up an alias for those nodes with >> a second name and use such a second name in the way described above. I am >> not aware (and I have not searched for) Slurm be able to understand such a >> situation on its own and therefore you will have to manually avoid "double >> booking". One way of doing that could be to configure the nodes with their >> first name in a way that Slurm thinks they have less resources. So for >> example in slurm.conf >> >> NodeName=gpu[01-06] CoresPerSocket=4 RealMemory=whatever1 Sockets=2 >> ThreadsPerCore=1 Weight=10000 State=UNKNOWN Gres=gpu:h100:4 >> NodeName=cpusingpu[01-06] CoresPerSocket=44 RealMemory=whatever2 >> Sockets=2 ThreadsPerCore=1 Weight=10000 State=UNKNOWN >> >> where gpuNN and cpusingpuNN are physically the same node and whatever1 + >> whatever2 is the actual maximum amount of memory you want Slurm to >> allocate. And you will also want to make sure the Weight are such that the >> non-GPU nodes get used first. >> >> Disclaimer: I'm thinking out loud, I have not tested this in practice, >> there may be something I overlooked. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users < >> slurm-users@lists.schedmd.com> wrote: >> >>> Dear all >>> >>> >>> >>> We have just installed a small SLURM cluster composed of 12 nodes: >>> >>> - 6 CPU only nodes: 2 Sockets=2, 96 CoresPerSocket 2, ThreadsPerCore=2, >>> 1.5 TB of RAM >>> - 6 nodes with also GPUS: same conf of the CPU-only node + 4 H100 per >>> node >>> >>> >>> We started with a setup with 2 partitions: >>> >>> - a 'onlycpus' partition which sees all the cpu-only nodes >>> - a 'gpus' partition which sees the nodes with gpus >>> >>> and asked users to use the 'gpus' partition only for jobs that need gpus >>> (for the time being we are not technically enforced that). >>> >>> >>> The problem is that a job requiring a GPU usually needs only a few cores >>> and only a few GB of RAM, which means wasting a lot of CPU cores. >>> And having all nodes in the same partition would mean that there is the >>> risk that a job requiring a GPU can't start if all CPU cores and/or all >>> memory is used by CPU only jobs >>> >>> >>> I went through the mailing list archive and I think that "splitting" a >>> GPU node into two logical nodes (one to be used in the 'gpus' partition and >>> one to be used in the 'onlycpus' partition) as discussed in [*] would help. >>> >>> >>> Since that proposed solution is considered by his author a "bit of a >>> kludge" and since I read that splitting a node into multiple logical nodes >>> is in a general a bad idea, I'd like to understand if you could suggest >>> other/best options. >>> >>> >>> I also found this [**] thread, but I don't like too much that approach >>> (i.e. relying on MaxCPUsPerNode) because it would mean having 3 partition >>> (if I have got it right): two partitions for cpu only jobs and 1 partition >>> for gpu jobs >>> >>> >>> Many thanks, Massimo >>> >>> >>> [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M >>> [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0 >>> >>> -- >>> slurm-users mailing list -- slurm-users@lists.schedmd.com >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >>> >>
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com