Hi Jürgen, I would take a look at the various *KmemSpace options in cgroups.conf, they can certainly help with this.
Cheers, -- Kilian On Thu, Jun 13, 2019 at 2:41 PM Juergen Salk <juergen.s...@uni-ulm.de> wrote: > > Dear all, > > I'm just starting to get used to Slurm and play around with it in a small test > environment within our old cluster. > > For our next system we will probably have to abandon our current exclusive > user > node access policy in favor of a shared user policy, i.e. jobs from different > users will then run side by side on the same node at the same time. In order > to > prevent the jobs from interfering with each other, I have set both > ConstrainCores=yes and ConstrainRAMSpace=yes in cgroups.conf, which works as > expected for limiting the memory of the processes to the value requested at > job > submission (e.g. by --mem=... option). > > However, I've noticed that ConstrainRAMSpace=yes does also cap the available > page cache for which the Linux kernel normally exploits any unused areas of > the > memory in a flexible way. This may result in a significant performance impact > as we do have quite a number of IO demanding applications (predominated by > read > operations) that are known to benefit a lot from page caching. > > Here comes a small example to illustrate this issue. The job writes a 16 GB > file to a local scratch file system, measures the amount of data cached in > memory and then reads the file previously written. > > $ cat job.slurm > #!/bin/bash > #SBATCH --partition=standard > #SBATCH --nodes=1 > #SBATCH --ntasks-per-node=1 > #SBATCH --time=00:10:00 > > # Get amount of data cached in memory before writing the file > cached1=`awk '$1=="Cached:" {print $2}' /proc/meminfo` > > # Write 16 GB file to local scratch SSD > dd if=/dev/zero of=$SCRATCH/testfile count=16 bs=1024M > > # Get amount of data cached in memory after writing the file > cached2=`awk '$1=="Cached:" {print $2}' /proc/meminfo` > > # Print difference of data cached in memory > echo -e "\nIncreased cached data by $(((cached2-cached1)/1000000)) GB\n" > > # Read the file previously written > dd if=$SCRATCH/testfile of=/dev/null count=16 bs=1024M > > $ > > For reference, this is the result *without* ConstrainRAMSpace=yes > set in cgroups.conf and submitted with `sbatch --mem=2G --gres=scratch:16 > job.slurm´ > > --- snip --- > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 10.9839 s, 1.6 GB/s > > Increased cached data by 16 GB > > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 5.03225 s, 3.4 GB/s > --- snip --- > > Note that there is 16 GB of data cached and the read > performance is 3.4 GB/s as the data is actually read from page > cache. > > And this is the result *with* ConstrainRAMSpace=yes set in cgroups.conf > and submitted with the very same command: > > --- snip --- > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 13.3163 s, 1.3 GB/s > > Increased cached data by 1 GB > > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 11.1098 s, 1.5 GB/s > --- snip --- > > Now only 1 GB of data has been cached (which is roughly > the 2 GB that have been requested for the job minus 1 GB > allocated by the dd buffer) resulting in a read performance > degradation to 1.5 GB/s (compared to 3.4 GB/s as above). > > Finally, this is the result with *with* ConstrainRAMSpace=yes > set in cgroups.conf and the job submitted with > `sbatch --mem=18G --gres=scratch:16 job.slurm´: > > --- snip --- > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 11.0601 s, 1.6 GB/s > > Increased cached data by 16 GB > > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB) copied, 5.01643 s, 3.4 GB/s > --- snip --- > > This is almost the same result as in the unconstrained case (i.e. without > ConstrainRAMSpace=yes set in cgroups.conf) as the amount of memory requested > for the job (18 GB) is large enough to allow the file to be fully cached in > memory. > > I do not think this is an issue with Slurm itself but how cgroups > are supposed to work. However, I wonder how others cope with this. > > Maybe we have to teach our users to also consider page cache when > requesting a certain amount of memory for their jobs? > > Any comment or idea would be highly appreciated. > > Thank you in advance. > > Best regards > Jürgen > > -- > Jürgen Salk > Scientific Software & Compute Services (SSCS) > Kommunikations- und Informationszentrum (kiz) > Universität Ulm > Telefon: +49 (0)731 50-22478 > Telefax: +49 (0)731 50-22471 > -- Kilian