Sergey Koposov <skopo...@cmu.edu> writes: > The trick is that my code uses memory mapping (i.e. mmap) of one > single large file (~12 Gb) in each thread on each node. > With this technique in the past despite the fact the file is > (read-only) mmaped in say 16 threads, the actual memory footprint was > still ~ 12 Gb. > However, when I now do this in slurm, it thinks that each thread (or > process) takes 12Gb and kills my processes.
We've seen this too (at least with older versions of Slurm; I haven't checked lately). Our way around it was to set JobAcctGatherParams=NoOverMemoryKill and use the cgroup task plugin (TaskPlugin=task/cgroup). The cgroup plugin will kill jobs if they exceed their limits (provided you have set up cgroup.conf to do it), but does not have the same problem of counting shared memory segments/mmap'ed files once for each thread/process. The NoOverMemoryKill tells Slurm itself not to kill the job, but leave it to the TaskPlugin. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature