On Dec 8, 2022, at 03:57, Loris Bennett 
<loris.benn...@fu-berlin.de<mailto:loris.benn...@fu-berlin.de>> wrote:

Loris Bennett <loris.benn...@fu-berlin.de<mailto:loris.benn...@fu-berlin.de>> 
writes:

Moshe Mergy <moshe.me...@weizmann.ac.il<mailto:moshe.me...@weizmann.ac.il>> 
writes:

Hi Sandor

I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):

 if (job_desc.min_mem_per_node == 0  or  job_desc.min_mem_per_cpu == 0) then
       slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix)
       slurm.log_info("%s: ERROR: job %s from user %s rejected because of an 
invalid (unlimited) memory request.", log_prefix, job_desc.name, 
job_desc.user_name)
       slurm.log_user("Job rejected because of an invalid memory request.")
       return slurm.ERROR
  end

What happens if somebody explicitly requests all the memory, so in
Sandor's case --mem=500G ?

Maybe there is a better or nicer solution...

Can't you just use account and QOS limits:

 https://slurm.schedmd.com/resource_limits.html

?

And anyway, what is the use-case for preventing someone using all the
memory? In our case, if someone really need all the memory, they should be able
to have it.

However, I do have a chronic problem with users requesting too much
memory. My approach has been to try to get people to use 'seff' to see
what resources their jobs in fact need.  In addition each month we
generate a graphical summary of 'seff' data for each user, like the one
shown here

 
https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik

and automatically send an email to those with a large percentage of
resource-inefficient jobs telling them to look at their graphs and
correct their resource requirements for future jobs.

Cheers,

Loris

I may be wrong about this, but aren’t people penalized in their fair share 
score for using too much memory, and effectively penalized for wasting it by 
“paying” for it even if they don’t need it? They’re also penalized for it by 
likely having to wait longer to have their request satisfied if they specify 
more than they need. That’s generally what I used to tell people.

I also make quite a bit of use of Ole Holm Nielsen’s pestat, to catch jobs that 
are not running efficiently, but that’s not automated, just a way to review.

https://github.com/OleHolmNielsen/Slurm_tools/blob/master/pestat/pestat

--
#BlackLivesMatter
____
|| \\UTGERS,    |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - 
novos...@rutgers.edu<mailto:novos...@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

Reply via email to