[slurm-dev] GPU accounting (WAS Re: sreport not reporting gpu info...)

Merlin Hartley Mon, 09 Oct 2017 04:02:30 -0700

That’s what I’ve been looking for too!

Though now I see that my configuration must be wrong - I am trying to make use 
of a GPU cost the same as 160 CPUs - so I have this config:


<snip>
PartitionName=DEFAULT  DefaultTime=24:0:0 MaxTime=14-0:0:0 MaxNodes=4 
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=160.0”
NodeName=pascal[01-03] Sockets=2 CoresPerSocket=8  ThreadsPerCore=2 
RealMemory=232000 Gres=gpu:pascal:4
PartitionName=pascal   Default=NO  State=UP Nodes=pascal[01-03] MaxNodes=1
<snip>

But sreport tells me
   mbu    <user info>            cpu    124828 
   mbu    <user info>       gres/gpu     49369 

For a user who exclusively uses GPU machines (4 GPUs and 16 CPUs per machine).

Any idea what I’ve missed?

Thanks


Merlin

--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom

> On 6 Oct 2017, at 20:30, Tim Carlson <tim.s.carl...@gmail.com> wrote:
> 
> Perfect!  Thanks!
> 
> # sreport cluster AccountUtilizationByUser -T cpu,gres/gpu Start=2017-10-06 
> End=2017-10-08
> --------------------------------------------------------------------------------
> Cluster/Account/User Utilization 2017-10-06T00:00:00 - 2017-10-06T11:59:59 
> (43200 secs)
> Use reported in TRES Minutes
> --------------------------------------------------------------------------------
>   Cluster         Account     Login     Proper Name      TRES Name     Used
> --------- --------------- --------- --------------- -------------- --------
>  marianas            root                                      cpu     143
>  marianas            root                                 gres/gpu     143
>  
>  marianas             ops       tim     Tim Carlson            cpu      143
>  marianas             ops       tim     Tim Carlson       gres/gpu      143
> 
> 
> On Fri, Oct 6, 2017 at 12:23 PM, Daniel Barker <danba...@umich.edu 
> <mailto:danba...@umich.edu>> wrote:
> Tim,
> I believe you have to refer to the gpu as gres/gpu.
> 
> [root@slurm-login ~]# sreport -T CPU,mem,gres/gpu cluster 
> AccountUtilizationByUser Start=2017-01-01 End=2017-12-31                      
>                                                                               
>                        
> --------------------------------------------------------------------------------
> Cluster/Account/User Utilization 2017-01-01T00:00:00 - 2017-10-06T14:59:59 
> (24069600 secs)
> Use reported in TRES Minutes
> --------------------------------------------------------------------------------
>   Cluster         Account     Login     Proper Name      TRES Name     Used 
> --------- --------------- --------- --------------- -------------- -------- 
>  deadpool            root                                      cpu      228 
>  deadpool            root                                      mem   192008 
>  deadpool            root                                 gres/gpu      259 
>  deadpool        hpcstaff                                      cpu      228 
>  deadpool        hpcstaff                                      mem   192008 
>  deadpool        hpcstaff                                 gres/gpu      259 
>  deadpool        hpcstaff  danbarke   Daniel Barker            cpu      198 
>  deadpool        hpcstaff  danbarke   Daniel Barker            mem   160947 
>  deadpool        hpcstaff  danbarke   Daniel Barker       gres/gpu      259 
> 
> -Dan
> 
> On Fri, Oct 6, 2017 at 3:12 PM, Tim Carlson <tim.s.carl...@gmail.com 
> <mailto:tim.s.carl...@gmail.com>> wrote:
> Background: Recently installed new cluster which I started with 14.03 but 
> then upgraded to 17.02 to get better/more gres/tres information.
> 
> In my other clusters I use sreport heavily to do billing and this new cluster 
> is GPU based and I want to bill off of GPU time consumed.  My assumption was 
> I could use something like
> 
>  sreport cluster AccountUtilizationByUser -T cpu,gpu Start=2017-10-06 
> End=2017-10-08 
> 
> I think I have slurm.conf configured correctly 
> 
> # grep -i gres /etc/slurm/slurm.conf
> AccountingStorageTRES=gres/gpu
> GresTypes=gpu
> NodeName=dl[01-25] Gres=gpu:2 Feature=ml01 Procs=16 State=UNKNOWN
> 
> And sacct seems to report the gres/tres utilization.
> 
> # sacct -X -u tim --format=jobid,elapsed,ReqTRES%30,ReqGRES 
> --starttime=2017-10-06 | tail
> 314            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 315            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 316            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 317            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 318            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 319            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 320            00:05:27        cpu=1,node=1,gres/gpu=1        gpu:1
> 321            00:05:24        cpu=1,node=1,gres/gpu=1        gpu:1
> 322            00:05:24        cpu=1,node=1,gres/gpu=1        gpu:1
> 323            00:05:24        cpu=1,node=1,gres/gpu=1        gpu:1
> 
> 
> # sreport cluster AccountUtilizationByUser -T gpu Start=2017-10-06 
> End=2017-10-08
> --------------------------------------------------------------------------------
> Cluster/Account/User Utilization 2017-10-06T00:00:00 - 2017-10-06T11:59:59 
> (43200 secs)
> Use reported in TRES Minutes
> --------------------------------------------------------------------------------
>   Cluster         Account     Login     Proper Name      TRES Name     Used
> --------- --------------- --------- --------------- -------------- --------
> 
> Yet if I ask for cpu time from the tres field I get what I want.
> 
> # sreport cluster AccountUtilizationByUser -T cpu,gpu Start=2017-10-06 
> End=2017-10-08
> --------------------------------------------------------------------------------
> Cluster/Account/User Utilization 2017-10-06T00:00:00 - 2017-10-06T11:59:59 
> (43200 secs)
> Use reported in TRES Minutes
> --------------------------------------------------------------------------------
>   Cluster         Account     Login     Proper Name      TRES Name     Used
> --------- --------------- --------- --------------- -------------- --------
>  marianas            root                                      cpu     143
>  marianas             ops                                      cpu      143
>  marianas             ops       tim     Tim Carlson            cpu      143
> 
> Bottom line being, what am I missing to get sreport to kick out gpu time?
> 
> 
> 
> 
> -- 
> Dan Barker
> ARC-TS
> 
>

[slurm-dev] GPU accounting (WAS Re: sreport not reporting gpu info...)

Reply via email to