Hi Would you mind to check your job scheduling settings in slurm.conf ?
Namely *SelectTypeParameters = **CR_CPU_Memory *or the like. Also, you may want to use systemd-cgtop to at least confirm jobs are indeed running in cgroups. Sincerely, S. Zhang On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky <boris...@gmail.com> wrote: > it's still not constraining memory... > > a memhog job continues to memhog: > > boris@rod:~/scripts$ sacct --starttime=2023-05-01 > --format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit > -j 199 > JobID User Start Elapsed ReqMem > MaxRSS MaxVMSize NodeList State ExitCode > ------------ --------- ------------------- ---------- ---------- > ---------- ---------- --------------- ---------- -------- > 199 boris 2023-06-23T02:42:30 00:01:21 1M > milhouse COMPLETED 0:0 > 199.batch 2023-06-23T02:42:30 00:01:21 > 104857988K 104858064K milhouse COMPLETED 0:0 > > One thing I noticed is that the machines I'm working on do not have > libcgroup and libcgroup-dev installed - but slurm does have its own cgroup > implementation? the slurmd processes do utilize /usr/lib/slurm/*cgroup.so > objects. I will try to recompile slurm with those cgrouplib packages > present. > > On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir < > vladimir.ozer...@jhuapl.edu> wrote: > >> No worries, >> >> No, we don’t have any OS level settings, only “allowed_devices.conf” >> which just has /dev/random, /dev/tty and stuff like that. >> >> >> >> But I think this could be the culprit, check out man page for cgroup.conf >> AllowedRAMSpace=100 >> >> >> >> I would just leave these four: >> >> CgroupAutomount=yes >> ConstrainCores=yes >> ConstrainDevices=yes >> ConstrainRAMSpace=yes >> >> >> >> Vlad. >> >> >> >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >> Of *Boris Yazlovitsky >> *Sent:* Thursday, June 22, 2023 5:40 PM >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory >> >> >> >> *APL external email warning: *Verify sender >> slurm-users-boun...@lists.schedmd.com before clicking links or >> attachments >> >> >> >> thank you Vlad - looks like we have the same yes's >> >> Do you remember if you had to make any settings on the OS level or in the >> kernel to make it work? >> >> >> >> -b >> >> >> >> On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir < >> vladimir.ozer...@jhuapl.edu> wrote: >> >> Hello, >> >> >> >> We have the following configured and it seems to be working ok. >> >> >> >> CgroupAutomount=yes >> ConstrainCores=yes >> ConstrainDevices=yes >> ConstrainRAMSpace=yes >> >> Vlad. >> >> >> >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >> Of *Boris Yazlovitsky >> *Sent:* Thursday, June 22, 2023 4:50 PM >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory >> >> >> >> *APL external email warning: *Verify sender >> slurm-users-boun...@lists.schedmd.com before clicking links or >> attachments >> >> >> >> Hello Vladimir, thank you for your response. >> >> >> >> this is the cgroups.conf file: >> >> CgroupAutomount=yes >> ConstrainCores=yes >> ConstrainDevices=yes >> ConstrainRAMSpace=yes >> ConstrainSwapSpace=yes >> MaxRAMPercent=90 >> AllowedSwapSpace=0 >> AllowedRAMSpace=100 >> MemorySwappiness=0 >> MaxSwapPercent=0 >> >> >> >> /etc/default/grub: >> >> GRUB_DEFAULT=0 >> GRUB_TIMEOUT_STYLE=hidden >> GRUB_TIMEOUT=0 >> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` >> GRUB_CMDLINE_LINUX_DEFAULT="" >> GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory >> swapaccount=1" >> >> >> >> what other cgroup settings need to be set? >> >> >> >> && thank you! >> >> -b >> >> >> >> On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir < >> vladimir.ozer...@jhuapl.edu> wrote: >> >> --mem=5G. Should allocate 5G of memory per node. >> >> Are your cgroups configured? >> >> >> >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >> Of *Boris Yazlovitsky >> *Sent:* Thursday, June 22, 2023 3:28 PM >> *To:* slurm-users@lists.schedmd.com >> *Subject:* [EXT] [slurm-users] --mem is not limiting the job's memory >> >> >> >> *APL external email warning: *Verify sender >> slurm-users-boun...@lists.schedmd.com before clicking links or >> attachments >> >> >> >> Running slurm 22.03.02 on Ubunutu 22.04 server. >> >> Jobs submitted with --mem=5g are able to allocate an unlimited amount of >> memory. >> >> >> >> how to limit on the job submission level how much memory it can grab? >> >> >> >> thanks, and best regards! >> Boris >> >> >> >>