Hi, 

Thanks again for all the suggestions. 
It turns out that on our cluster we can't use the cgroups because of the old 
kernel, 
but setting 
    JobAcctGatherParams=UsePSS
resolved the problems.

Regards,
         Sergey

On Fri, 2019-01-11 at 10:37 +0200, Janne Blomqvist wrote:
> On 11/01/2019 08.29, Sergey Koposov wrote:
> > Hi,
> > 
> > I've recently migrated to slurm from pbs on our cluster. Because of that, 
> > now the job memory limits are
> > strictly enforced and that causes my code to get killed.
> > The trick is that my code uses memory mapping (i.e. mmap) of one single 
> > large file (~12 Gb) in each thread on each node.
> > With this technique in the past despite the fact the file is (read-only) 
> > mmaped in say 16 threads, the actual memory footprint was still ~ 12 Gb.
> > However, when I now do this in slurm, it thinks that each thread (or 
> > process) takes 12Gb and kills my processes.
> > 
> > Does anyone has a way around this problem ? Other then stoping using Memory 
> > as a consumable resource, or faking that each node has more memory ?
> > 
> > Here is an example slurm script that I'm running
> > #!/bin/bash
> > #SBATCH -N 1 # number of nodes
> > #SBATCH --cpus-per-task=10 # number of cores
> > #SBATCH --ntasks-per-node=1
> > #SBATCH --mem=125GB
> > #SBATCH --array=0-4
> > 
> > sh script1.sh $SLURM_ARRAY_TASK_ID 5
> > 
> > The script1 essentially starts python which in turn create 10 
> > multiprocessing processes each of which will mmap the large file.
> > ------
> > In this case I'm forced to limit myself to using only 10 threads, instead 
> > of 16 (our machines have 16 cores) to avoid being killed by slurm.
> > ---
> > Thanks in advance for any suggestions.
> >          
> >             Sergey
> > 
> 
> What is your memory limit configuration in slurm? Anyway, a few things to 
> check:
> 
> - Make sure you're not limiting RLIMIT_AS in any way (e.g. run "ulimit -v" in 
> your batch script, ensure it's unlimited. In the slurm config, ensure
> VSizeFactor=0).
> - Are you using task/cgroup for limiting memory? In that case the problem 
> might be that cgroup memory limits work with RSS, and as you're running 
> multiple
> processes the shared mmap'ed file will be counted multiple times. There's no 
> really good way around this, but with, say, something like
> 
> ConstrainRAMSpace=no
> ConstrainSwapSpace=yes
> AllowedRAMSpace=100
> AllowedSwapSpace=1600
> you'll get a setup where the cgroup soft limit will be set to the amount your 
> job allocates, but the hard limit (where the job will be killed) will be set 
> to
> 1600% of that.
> - If you're using cgroups for memory limits, you should also set 
> JobAcctGatherParams=NoOverMemoryKill
> - If you're NOT using cgroups for memory limits, try setting 
> JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings 
> multiple times.
> 

Reply via email to