[slurm-users] possible to set memory slack space before killing jobs?

Eli V Wed, 05 Dec 2018 11:51:13 -0800

We run our cluster using select parms CR_Core_Memory and always
require a user to set the memory used when submitting a job to avoid
swapping our nodes to uselessness. However, since slurmd is pretty
vigilant about killing jobs that exceed their request we end up with
jobs requesting more memory then needed leading to our node's CPUs
being underutilized.


What would be really nice would be if I could set a percent memory
slack so a job wouldn't be killed until it exceeded it's requested
memory by the given percent, essentially allowing an admin or perhaps,
user controlled amount of memory overcommit.

So, for example, a node with 256GB of RAM could run 8 jobs requesting
32GB of RAM currently even if they just average 28GB per job, but
could run 9 jobs requesting 28GB of RAM allowing 15% overcommit,
without having to worry about the occasional higher mem job being
killed.

Anyone have some thoughts/ideas about this? Seems like it should be
relatively straightforward to implement, though of course using it
effectively will require some tuning.

[slurm-users] possible to set memory slack space before killing jobs?

Reply via email to