"Ohlerich, Martin" <martin.ohler...@lrz.de> writes: > Hello Björn-Helge. > > > Sigh ... > > First of all, of course, many thanks! This indeed helped a lot!
Good! > b) This only works if I have to specify --mem for a task. Although > manageable, I wonder why one needs to be that restrictive. In > principle, in the use case outlined, one task could use a bit less > memory, and the other may require a bit more the half of the node's > available memory. (So clearly this isn't always predictable.) I only > hope that in such cases the second task does not die from OOM ... (I > will know soon, I guess.) As I understand it, Slurm (at least cgroups) will only kill a step if it uses more memory *in total* on a node than the job got allocated to the node. So if a job has 10 GiB allocated on a node, and a step runs two tasks there, one task could use 9 GiB and the other 1 GiB without the step being killed. You can inspect the memory limits that are in effect in cgroups (v1) in /sys/fs/cgroup/memory/slurm/uid_<uid>/job_<jobid> (usual location, at least). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature