I'm not aware of one.  This may be worth a feature request to the devs at bugs.schedmd.com

-Paul Edmon-

On 10/16/18 7:29 AM, Antony Cleave wrote:
Hi All

Yes, I realise this is almost certainly the intended outcome. I have wondered this for a long time but only recently got round to testing it on a safe system.

Process is simple run a lot of jobs
let decay take effect
change the setting
restart dbd and ctld
run another job with debug2 on the ctld
read the log to see that the QoS stil has the same accounting number

[2018-10-15T13:18:16.404] debug2: acct_policy_job_begin: after adding job 4304, qos normal grp_used_tres_run_secs(cpu) is 14400 [2018-10-15T13:47:45.789] debug2: acct_policy_job_begin: after adding job 4304, qos normal grp_used_tres_run_secs(cpu) is 14400


I wonder if there is a way to have Slurm recalculate the historical usage of users/accounts/QoS used for resource limits calculations. It has all of the data to do so in the database. I did try cleaning out all of the cluster_usage_(month|day|hour)_tables in the accounting db after making a backup as a bit of an experiment but this just cleans the state for everyone as expected

for the record the full usage undecayed is:
sacct -nP -X -D -q normal --format=CPUTimeRAW -S2018-01-01 | awk -F"|" 'BEGIN { sum=0; } { sum += $1; } END { print int(sum/60); }'
214676
cpu minutes showing that it does indeed still have the data required to recalculate the usages if we wished to do it.

I know that this would take quite a while to do all the hourly rollups but it would be useful to rebalance  the system once it was realised the decay had been set too fast i.e. left at the default of 1 week.

Also is there a way for a normal user to see the decayed usage of account/user/QoS? the raw usage is there (as above) but this is  just fuels resentment when your jobs are held back by a limit and someone from an account with way more usage (which has decayed away to nothing) and the same limit is allowed to run.

Antony



Reply via email to