Well that's kind of the core issue- without cgroups _any_ process in the job will have access to all of the GPUs on the system and there's not much more that Slurm can do about it at that point.
I would have a look at the environment variable CUDA_VISIBLE_DEVICES <https://slurm.schedmd.com/gres.html#GPU_Management>. It is set by Slurm and should have an index (0, 1, 2, etc.) directing applications to an appropriate GPU. I think it's more a case that the batch processes are honoring that variable and the interactive job is not. - Michael On Wed, Feb 12, 2025 at 9:00 PM navin srivastava via slurm-users < slurm-users@lists.schedmd.com> wrote: > Thank you Jesse. > > I am using Enterprise SLES15SP6 as the OS. I have not introduced the > cgroup functionality in my environment. I can think about it and will see > if this solution works out. but is there any other way to use without > Cgroup to achieve the same. Batch job requests are fine 2 jobs with each > one GPU request works fine. in the case of mix( 1 batch job and other > Interactive job) creating the problem. > > Is there a way I can run a job and apply the exclusive way only on GPU > resources? > > Regards > Navin. > > > > On Wed, Feb 12, 2025 at 11:24 PM Chintanadilok, Jesse <jc...@ti.com> > wrote: > >> Navin, >> >> >> >> You can isolate GPUs per job if you have cgroups set up properly. What OS >> are you using? Newer OSes will support cgroupsv2 out of the box, but if >> necessary you can continue using v1, this workflow should be applicable for >> both. >> >> >> >> Add ConstrainDevices=yes to your cgroup.conf >> >> >> >> This is what the file looks like at my site: >> >> /etc/slurm/cgroup.conf >> >> CgroupMountpoint="/sys/fs/cgroup" >> >> ConstrainCores=yes >> >> ConstrainRAMSpace=yes >> >> ConstrainSwapSpace=no >> >> ConstrainDevices=yes >> >> >> >> You can find the documentation here: >> >> https://slurm.schedmd.com/cgroup.conf.html >> >> >> >> If you want to share GPUs you can use CUDA MPS or MIG if your GPU >> supports it. >> >> >> >> Regards, >> >> Jesse Chintanadilok >> >> >> >> *From:* navin srivastava via slurm-users <slurm-users@lists.schedmd.com> >> *Sent:* Wednesday, February 12, 2025 10:30 >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* [EXTERNAL] [slurm-users] avoid using same GPU by the >> interactive job >> >> >> >> hi, facing an issue in my environment where the batch job and the >> interactive job use the same gpu. Each server has 2 gpu. When 2 batch jobs >> are running it works fine and use the 2 different gpu's. but if one batch >> job is running and another >> >> ZjQcmQRYFpfptBannerStart >> >> *This message was sent from outside of Texas Instruments. * >> >> Do not click links or open attachments unless you recognize the source of >> this email and know the content is safe. >> >> * Report Suspicious * >> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/G3vK!tDdkczjudcZWZCqpHP6Ikzi-El1-dpSwALBpmsdoXJOODQgC9RVWKYSLBAkkSja6JDeYPDDqYANiCMm4xgWAtpPabtvdEeWe5cMxQWuw7pV_l7LSV6lbgQ$> >> >> >> >> ZjQcmQRYFpfptBannerEnd >> >> hi, >> >> >> >> facing an issue in my environment where the batch job and the >> interactive job use the same gpu. >> >> >> >> Each server has 2 gpu. When 2 batch jobs are running it works fine and >> use the 2 different gpu's. but if one batch job is running and another job >> is submitted interactively then it uses the same GPU . Is there a way to >> avoid this? >> >> >> >> GresTypes=gpu >> >> NodeName=node[01-02] NodeAddr=node[01-02] CPUs=48 Boards=1 >> SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=1 TmpDisk=6000000 >> RealMemory=515634 Feature=A100 Gres=gpu:2 >> >> >> >> PartitionName=onprem Nodes=node[01-10] Default=YES MaxTime=21-00:00:00 >> DefaultTime=3-00:00:00 State=UP Shared=YES OverSubscribe=NO >> >> >> >> gres.conf: >> >> Name=gpu File=/dev/nvidia0 >> >> Name=gpu File=/dev/nvidia1 >> >> >> >> Any suggestions on this. >> >> >> >> Regards >> >> Navin >> > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com