Hello,

I'm hoping someone can offer some suggestions.

I went ahead started the database from scratch and reinitialized it to see if 
that would help and to try and understand how RawUsage is calculated. I ran two 
jobs of

sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100

With the partition defined as

PriorityFlags=MAX_TRES
PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP 
TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"

I expected each job to contribute 6000 to the RawUsage, however one job 
contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all 
categories.

I'm at a loss as to what is going on.

Thank you,

Tyler

Sent with [Proton Mail](https://proton.me/mail/home) secure email.

On Tuesday, September 10th, 2024 at 9:03 PM, tluchko <tluc...@protonmail.com> 
wrote:

> Hello,
>
> We have a new cluster and I'm trying to setup fairshare accounting. I'm 
> trying to track CPU, MEM and GPU. It seems that billing for individual jobs 
> is correct, but billing isn't being accumulated (TRESRunMin is always 0).
>
> In my slurm.conf, I think the relevant lines are
>
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageTRES=gres/gpu
> PriorityFlags=MAX_TRES
>
> PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 
> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
> PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 
> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6"
> I currently have one recently finished job and one running job. sacct gives
>
> $ sacct 
> --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50
> JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax
> ------------ ---------- -------------------------------------------------- 
> -------------------------------------------------- 
> -------------------------------------------------- 
> --------------------------------------------------
> 154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 
> billing=9,cpu=2,gres/gpu=1,mem=2G,node=1
> 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 
> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ 
> cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+
> 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 
> billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ 
> cpu=2,gres/gpu=1,mem=2G,node=1
>
> billing=9 seems correct to me, since I have 1 GPU allocated, which has the 
> largest score of 9.6. However, sshare doesn't show anything in TRESRunMins
>
> sshare 
> --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110
> Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins
> -------------------- ---------- ---------- ---------- ----------- 
> ------------- 
> --------------------------------------------------------------------------------------------------------------
> root 21589714 1.000000 
> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
> abrol_group 2000 0 0.000000 
> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
> luchko_group 2000 21589714 1.000000 
> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
>  luchko_group tluchko 1 0.333333 21589714 1.000000 
> cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0
>
> Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and 
> slurmdbd is running.
>
> Thank you,
>
> Tyler
>
> Sent with [Proton Mail](https://proton.me/) secure email.
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to