Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

2024-01-18 Thread Matthias Loose
Hi Hafedh, Im no expert in the GPU side of SLURM, but looking at you current configuration to me its working as intended at the moment. You have defined 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressource the be free again. I think what you need to l

Re: [slurm-users] Custom Gres for SSD

2023-07-24 Thread Matthias Loose
On 2023-07-24 09:50, Matthias Loose wrote: Hi Shunran, just read your question again. If you dont want users to share the SSD, like at all even if both have requested it you can basically skip the quota part of my awnser. If you really only want one user per SSD per node you should set the

Re: [slurm-users] Custom Gres for SSD

2023-07-24 Thread Matthias Loose
Hi Shunran, we do something very similar. I have nodes with 2 SSDs in a Raid1 mounted on /local. We defined a gres ressource just like you and called it local. We define the ressource in the gres.conf like this: # LOCAL NodeName=hpc-node[01-10] Name=local and add the ressource in counts

[slurm-users] Get Information from a Node to the MailProg Command / Add arbitrary information to a job

2021-06-15 Thread Matthias Loose
Hi Slurm Users, first time posting. I have a new slurm setup where the users can specify an amount of local node disk space they wish to use. This is a "gres" resource named "local" and it measures in GB. Once the user has scheduled a job and it gets executed, I create a folder for this job on