We run our cluster using select parms CR_Core_Memory and always require a user to set the memory used when submitting a job to avoid swapping our nodes to uselessness. However, since slurmd is pretty vigilant about killing jobs that exceed their request we end up with jobs requesting more memory then needed leading to our node's CPUs being underutilized.
What would be really nice would be if I could set a percent memory slack so a job wouldn't be killed until it exceeded it's requested memory by the given percent, essentially allowing an admin or perhaps, user controlled amount of memory overcommit. So, for example, a node with 256GB of RAM could run 8 jobs requesting 32GB of RAM currently even if they just average 28GB per job, but could run 9 jobs requesting 28GB of RAM allowing 15% overcommit, without having to worry about the occasional higher mem job being killed. Anyone have some thoughts/ideas about this? Seems like it should be relatively straightforward to implement, though of course using it effectively will require some tuning.