Hi,

> So here's something funny. One user submitted a job that requested 60 cpu's 
> and 400000M of memory. Our largest nodes in that partition have 72 cpu's and 
> 256G of memory. So when a user requests 400G of ram, what would be good 
> behavior? I would like to see slurm reject the job, "job is impossible to 
> run." Instead, slurm keeps slowly growing the priority of that job (because 
> of fairshare) and the job effectively disables the nodes that are trying to 
> free up memory for it. (All the nodes that have enough cpu's). Not just one 
> node that has enough cpu's. This is a combination of *multiple* bad 
> behaviors. Stopping a node that can never satisfy the request is bad... 
> Stopping *all* nodes that have enough cpu's, even though none of them can 
> ever satisfy the request is extra bad.

EnforcePartLimits=YES

would be your friend :)

We also use a submission filter that checks for requests and adjust some
things if needed :)

Regards,
-- Andy

Attachment: signature.asc
Description: PGP signature

Reply via email to