Re: [slurm-users] Strange memory limit behavior with --mem-per-gpu

Paul Raines Fri, 08 Apr 2022 05:50:34 -0700


Sorry, should have stated that before.  I am running Slurm 20.11.3
on CentOS 8 Stream that I compiled myself back in June 2021.


I will try to arrange an upgrade in the next few weeks.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 8 Apr 2022 4:02am, Bjørn-Helge Mevik wrote:

Paul Raines <rai...@nmr.mgh.harvard.edu> writes:

Basically, it appears using --mem-per-gpu instead of just --mem gives
you unlimited memory for your job.

$ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00
--ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G
--mail-type=FAIL --pty /bin/bash
rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID
/sys/fs/cgroup/memory/slurm/uid_5829/job_1134067
rtx-07[0]:~$ cat 
/sys/fs/cgroup/memory/slurm/uid_5829/job_1134067/memory.limit_in_bytes
1621419360256

That is a limit of 1.5TB which is all the memory on rtx-07, not
the 8G I effectively asked for at 1 GPU and 8G per GPU.


Which version of Slurm is this?  We noticed a behaviour similar to this
on Slurm 20.11.8, but when we tested it on 21.08.1, we couldn't
reproduce it.  (We also noticed an issue with --gpus-per-task that
appears to have been fixed in 21.08.)

--
B/H

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
<https://www.massgeneralbrigham.org/complianceline> .

Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.

Re: [slurm-users] Strange memory limit behavior with --mem-per-gpu

Reply via email to