Hi Bjørn-Helge,
On 6/23/22 09:18, Bjørn-Helge Mevik wrote:
<gerard....@cines.fr> writes:
TRESRaw cpu is lower than before as I'm alone on the system an no other job was
submitted.
Any explanation of this ?
I'd guess you have turned on FairShare priorities. Unfortunately, in
Slurm the same internal variables are used for fairshare calculations as
for GrpTRESMins (and similar), so when fair share priorities are in use,
slurm will reduce accumulated GrpTRESMins over time. This means that it
is impossible(*) to use GrpTRESMins limits and fairshare
priorities at the same time.
This is a surprising observation! We use a 14 days HalfLife in slurm.conf:
PriorityDecayHalfLife=14-0
Since our longest running jobs can run only 7 days, maybe our limits never
get reduced as you describe?
The slurm.conf man-page says that PriorityDecayHalfLife affects hard time
limits per association:
PriorityDecayHalfLife
This controls how long prior resource use is considered in
determining how over- or under-serviced an association is (user,
bank account and cluster) in determining job priority. The
record of usage will be decayed over time, with half of the
original value cleared at age PriorityDecayHalfLife. If set to
0 no decay will be applied. This is helpful if you want to
enforce hard time limits per association. If set to 0 Priori‐
tyUsageResetPeriod must be set to some interval. Applicable
only if PriorityType=priority/multifactor. The unit is a time
string (i.e. min, hr:min:00, days-hr:min:00, or days-hr). The
default value is 7-0 (7 days).
Is this what explains your statement?
BTW, I've written a handy script for displaying user limits in a readable
format:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
/Ole