[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

2025-02-13 Thread Michael Gutteridge via slurm-users
Well that's kind of the core issue- without cgroups _any_ process in the job will have access to all of the GPUs on the system and there's not much more that Slurm can do about it at that point. I would have a look at the environment variable CUDA_VISIBLE_DEVICES

[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

2025-02-12 Thread navin srivastava via slurm-users
Thank you Jesse. I am using Enterprise SLES15SP6 as the OS. I have not introduced the cgroup functionality in my environment. I can think about it and will see if this solution works out. but is there any other way to use without Cgroup to achieve the same. Batch job requests are fine 2 jobs wit

[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

2025-02-12 Thread Chintanadilok, Jesse via slurm-users
Navin, You can isolate GPUs per job if you have cgroups set up properly. What OS are you using? Newer OSes will support cgroupsv2 out of the box, but if necessary you can continue using v1, this workflow should be applicable for both. Add ConstrainDevices=yes to your cgroup.conf This is what t