Eli V <eliven...@gmail.com> writes: > We run our cluster using select parms CR_Core_Memory and always > require a user to set the memory used when submitting a job to avoid > swapping our nodes to uselessness. However, since slurmd is pretty > vigilant about killing jobs that exceed their request we end up with > jobs requesting more memory then needed leading to our node's CPUs > being underutilized. > > What would be really nice would be if I could set a percent memory > slack so a job wouldn't be killed until it exceeded it's requested > memory by the given percent, essentially allowing an admin or perhaps, > user controlled amount of memory overcommit. > > So, for example, a node with 256GB of RAM could run 8 jobs requesting > 32GB of RAM currently even if they just average 28GB per job, but > could run 9 jobs requesting 28GB of RAM allowing 15% overcommit, > without having to worry about the occasional higher mem job being > killed. > > Anyone have some thoughts/ideas about this? Seems like it should be > relatively straightforward to implement, though of course using it > effectively will require some tuning.
It is not clear to me that this is a good idea. I think it is important to inform users about the memory usage of their jobs, so that they can estimate their requirements as accurately as possible. If, as a user, I find my job runs successfully even if I underestimate the memory needed, there is no real incentive for me to be more accurate in future. In fact, I may be rewarded for requesting too little RAM, since jobs requesting fewer resources may tend to start earlier. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de