I'm not aware of one. This may be worth a feature request to the devs
at bugs.schedmd.com
-Paul Edmon-
On 10/16/18 7:29 AM, Antony Cleave wrote:
Hi All
Yes, I realise this is almost certainly the intended outcome. I have
wondered this for a long time but only recently got round to testing
it on a safe system.
Process is simple run a lot of jobs
let decay take effect
change the setting
restart dbd and ctld
run another job with debug2 on the ctld
read the log to see that the QoS stil has the same accounting number
[2018-10-15T13:18:16.404] debug2: acct_policy_job_begin: after adding
job 4304, qos normal grp_used_tres_run_secs(cpu) is 14400
[2018-10-15T13:47:45.789] debug2: acct_policy_job_begin: after adding
job 4304, qos normal grp_used_tres_run_secs(cpu) is 14400
I wonder if there is a way to have Slurm recalculate the historical
usage of users/accounts/QoS used for resource limits calculations. It
has all of the data to do so in the database. I did try cleaning out
all of the cluster_usage_(month|day|hour)_tables in the accounting db
after making a backup as a bit of an experiment but this just cleans
the state for everyone as expected
for the record the full usage undecayed is:
sacct -nP -X -D -q normal --format=CPUTimeRAW -S2018-01-01 | awk -F"|"
'BEGIN { sum=0; } { sum += $1; } END { print int(sum/60); }'
214676
cpu minutes showing that it does indeed still have the data required
to recalculate the usages if we wished to do it.
I know that this would take quite a while to do all the hourly rollups
but it would be useful to rebalance the system once it was realised
the decay had been set too fast i.e. left at the default of 1 week.
Also is there a way for a normal user to see the decayed usage of
account/user/QoS? the raw usage is there (as above) but this is just
fuels resentment when your jobs are held back by a limit and someone
from an account with way more usage (which has decayed away to
nothing) and the same limit is allowed to run.
Antony