Just following up on my own message in case someone else is trying to figure out RawUsage and Fair Share.
I ran some additional tests, except that I ran jobs for 10 min instead of 1 min. The procedure was 1. Set the accounting stats to update every minute in slurm.conf PriorityCalcPeriod=1 2. Reset the RawUsage stat sacctmgr modify account luchko_group set RawUsage=0 3. Check the RawUsage every second while sleep 1; do date; sshare -ao Account,User,RawShares,NormShares,RawUsage ; done > watch.out 4. Run a 10 min job. The billing per CPU is 1, so the total RawUsage should 60,000 and the RawUsage should increase 6,000 each minute sbatch --account=luchko_group --wrap="sleep 600" -p cpu -n 100 Scanning the output file, I can see that the RawUsage does update once every minute. Below are the updates. (I've removed irrelevant output.) Tue Sep 24 10:14:24 AM PDT 2024 Account User RawShares NormShares RawUsage -------------------- ---------- ---------- ----------- ----------- luchko_group tluchko 100 0.500000 0 Tue Sep 24 10:14:25 AM PDT 2024 luchko_group tluchko 100 0.500000 4099 Tue Sep 24 10:15:24 AM PDT 2024 luchko_group tluchko 100 0.500000 10099Tue Sep 24 10:16:25 AM PDT 2024 luchko_group tluchko 100 0.500000 16099 Tue Sep 24 10:17:24 AM PDT 2024 luchko_group tluchko 100 0.500000 22098 Tue Sep 24 10:18:25 AM PDT 2024 luchko_group tluchko 100 0.500000 28097 Tue Sep 24 10:19:24 AM PDT 2024 luchko_group tluchko 100 0.500000 34096 Tue Sep 24 10:20:25 AM PDT 2024 luchko_group tluchko 100 0.500000 40094 Tue Sep 24 10:21:24 AM PDT 2024 luchko_group tluchko 100 0.500000 46093 Tue Sep 24 10:22:25 AM PDT 2024 luchko_group tluchko 100 0.500000 52091 Tue Sep 24 10:23:24 AM PDT 2024 luchko_group tluchko 100 0.500000 58089 Tue Sep 24 10:24:25 AM PDT 2024 luchko_group 2000 0.133324 58087 Tue Sep 24 10:25:25 AM PDT 2024 luchko_group tluchko 100 0.500000 58085 So, the RawUsage does increase by the expected amount each minute, and the RawUsage does decay (I have the half-life set to 14 days). However, the update for the last part of a minute, which should be 1901, is not recorded. I suspect this is because the job is no longer running when the accounting update occurs. For typical jobs that run for hours or days, this is a negligible error, but it does explain the results I got when I ran a 1 min job. TRESRunMins is still not updating, but this is an inconvenience. Tyler Sent with [Proton Mail](https://proton.me/mail/home) secure email. On Thursday, September 19th, 2024 at 8:47 PM, tluchko via slurm-users <slurm-users@lists.schedmd.com> wrote: > Hello, > > I'm hoping someone can offer some suggestions. > > I went ahead started the database from scratch and reinitialized it to see if > that would help and to try and understand how RawUsage is calculated. I ran > two jobs of > > sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100 > > With the partition defined as > > PriorityFlags=MAX_TRES > PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 > State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" > > I expected each job to contribute 6000 to the RawUsage, however one job > contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all > categories. > > I'm at a loss as to what is going on. > > Thank you, > > Tyler > > Sent with [Proton Mail](https://proton.me/mail/home) secure email. > > On Tuesday, September 10th, 2024 at 9:03 PM, tluchko <tluc...@protonmail.com> > wrote: > >> Hello, >> >> We have a new cluster and I'm trying to setup fairshare accounting. I'm >> trying to track CPU, MEM and GPU. It seems that billing for individual jobs >> is correct, but billing isn't being accumulated (TRESRunMin is always 0). >> >> In my slurm.conf, I think the relevant lines are >> >> AccountingStorageType=accounting_storage/slurmdbd >> AccountingStorageTRES=gres/gpu >> PriorityFlags=MAX_TRES >> >> PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 >> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" >> PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 >> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" >> I currently have one recently finished job and one running job. sacct gives >> >> $ sacct >> --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50 >> JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax >> ------------ ---------- -------------------------------------------------- >> -------------------------------------------------- >> -------------------------------------------------- >> -------------------------------------------------- >> 154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 >> billing=9,cpu=2,gres/gpu=1,mem=2G,node=1 >> 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 >> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ >> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ >> 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 >> billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ >> cpu=2,gres/gpu=1,mem=2G,node=1 >> >> billing=9 seems correct to me, since I have 1 GPU allocated, which has the >> largest score of 9.6. However, sshare doesn't show anything in TRESRunMins >> >> sshare >> --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110 >> Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins >> -------------------- ---------- ---------- ---------- ----------- >> ------------- >> -------------------------------------------------------------------------------------------------------------- >> root 21589714 1.000000 >> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 >> abrol_group 2000 0 0.000000 >> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 >> luchko_group 2000 21589714 1.000000 >> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 >> luchko_group tluchko 1 0.333333 21589714 1.000000 >> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 >> >> Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and >> slurmdbd is running. >> >> Thank you, >> >> Tyler >> >> Sent with [Proton Mail](https://proton.me/) secure email.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com