Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

Andy Georges Wed, 10 Jul 2019 01:00:06 -0700

Hi,

> So here's something funny. One user submitted a job that requested 60 cpu's 
> and 400000M of memory. Our largest nodes in that partition have 72 cpu's and 
> 256G of memory. So when a user requests 400G of ram, what would be good 
> behavior? I would like to see slurm reject the job, "job is impossible to 
> run." Instead, slurm keeps slowly growing the priority of that job (because 
> of fairshare) and the job effectively disables the nodes that are trying to 
> free up memory for it. (All the nodes that have enough cpu's). Not just one 
> node that has enough cpu's. This is a combination of *multiple* bad 
> behaviors. Stopping a node that can never satisfy the request is bad... 
> Stopping *all* nodes that have enough cpu's, even though none of them can 
> ever satisfy the request is extra bad.


EnforcePartLimits=YES

would be your friend :)

We also use a submission filter that checks for requests and adjust some
things if needed :)

Regards,
-- Andy

signature.asc
Description: PGP signature

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

Reply via email to