Re: [slurm-users] srun jobfarming hassle question

Bjørn-Helge Mevik Wed, 18 Jan 2023 23:25:06 -0800

"Ohlerich, Martin" <martin.ohler...@lrz.de> writes:

> Hello Björn-Helge.
>
>
> Sigh ...
>
> First of all, of course, many thanks! This indeed helped a lot!


Good!

> b) This only works if I have to specify --mem for a task. Although
> manageable, I wonder why one needs to be that restrictive. In
> principle, in the use case outlined, one task could use a bit less
> memory, and the other may require a bit more the half of the node's
> available memory. (So clearly this isn't always predictable.) I only
> hope that in such cases the second task does not die from OOM ... (I
> will know soon, I guess.)

As I understand it, Slurm (at least cgroups) will only kill a step if it
uses more memory *in total* on a node than the job got allocated to the
node.  So if a job has 10 GiB allocated on a node, and a step runs two
tasks there, one task could use 9 GiB and the other 1 GiB without the
step being killed.

You can inspect the memory limits that are in effect in cgroups (v1) in
/sys/fs/cgroup/memory/slurm/uid_<uid>/job_<jobid> (usual location, at
least).

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

signature.asc
Description: PGP signature

Re: [slurm-users] srun jobfarming hassle question

Reply via email to