Re: [slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

2022-04-05 Thread Greg Wickham
Hi John, Mark, We use a spank plugin https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was derived from other authors but modified for functionality required on site). It can bind tmpfs mount points to the users cgroup allocation, additionally bind options can be provided (ie: l

[slurm-users] Missing slurmdbd systemd unit after installing Slurm from source (Ubuntu 20.04)

2022-04-05 Thread Benjamin Arntzen
Hi there, Having a curious issue. I'm installing Slurm using the instructions @ https://slurm.schedmd.com/quickstart_admin.html, (Basic configure, make, make install) and it works fine, but I'm noticing something odd. While the systemd units for slurmd and slurmctld get installed, the syste

Re: [slurm-users] Node is not allocating all CPUs

2022-04-05 Thread Brian Andrus
You want to see what is output on the node itself when you run: slurmd -C Brian Andrus On 4/5/2022 2:11 PM, Guertin, David S. wrote: We've added a new GPU node to our cluster with 32 cores. It contains 2 16-core sockets, and hyperthreading is turned off, so the total is 32 cores. But jobs

[slurm-users] Node is not allocating all CPUs

2022-04-05 Thread Guertin, David S.
We've added a new GPU node to our cluster with 32 cores. It contains 2 16-core sockets, and hyperthreading is turned off, so the total is 32 cores. But jobs are only being allowed to use 16 cores. Here's the relevant line from slurm.conf: NodeName=node020 CoresPerSocket=16 RealMemory=257600 Thr

Re: [slurm-users] Managing shared memory (/dev/shm) usage per job?

2022-04-05 Thread John Hanks
I've thought-experimented this in the past, wanting to do the same thing but haven't found any way to get a/dev/shm or a tmpfs into a job's cgroups to be accounted against the job's allocation. The best I have come up with is creating a per-job tmpfs from a prolog, removing from epilog and setting

[slurm-users] Lua to reject if maintenance window

2022-04-05 Thread Brian Andrus
All, Not sure if this is already out there, but it would be nice to be able to immediately reject interactive jobs that are going to be held due to an upcoming maintenance window. Does anyone already have this? If not, I suspect I will work on it as a lua function for the job_submit.lua Brian An

Re: [slurm-users] Sharing a GPU

2022-04-05 Thread Kamil Wilczek
Thank you all for the help! The plugin seems to be thing I'm looking for. I'll try to test it with a spare server/GPUs. Thank again! -- Kamil Wilczek W dniu 04.04.2022 o 09:20, Bas van der Vlies pisze: We have the exact same request for our GPUS that are not A100 and we have developed a lua plu