Hi All,

I've run into a strange problem with my slurm configuration. Trying to set up 
AccountingStorage properly so that I can use OpenXDMoD for producing usage 
reports, but the output I'm getting from sacct only has 0's for a huge number 
of fields like NCPUs and CPUTimeRaw (which are rather important for useage 
reports).

Has anyone here run into something similar before? It would be great if someone 
could point out what I've mis-configured. I've pasted the relevant bits of my 
slurm config and sacct output after my sig.

Thanks!


------------------------------------
Eric Coulter         jecou...@iu.edu
XSEDE Capabilities and Resource Integration Engineer
IU Campus Bridging & Research Infrastructure
RT/PTI/UITS
812-856-3250

jecoulte@headnode ~]$ scontrol show config | grep Acc
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = none
AccountingStorageHost   = headnode
AccountingStorageLoc    = /var/log/slurmacct.log
AccountingStoragePort   = 0
AccountingStorageTRES   = cpu,mem,energy,node      #Added these in case the 
default wasn't being respected for some reason...
AccountingStorageType   = accounting_storage/filetxt
AccountingStorageUser   = root
AccountingStoreJobComment = Yes
AcctGatherEnergyType    = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInfinibandType = acct_gather_infiniband/none
AcctGatherNodeFreq      = 0 sec
AcctGatherProfileType   = acct_gather_profile/none
JobAcctGatherFrequency  = 30
JobAcctGatherType       = jobacct_gather/linux
JobAcctGatherParams     = (null)?

For a job running on 2 nodes, 1 cpu per node, sacct shows:
[jecoulte@headnode ~]$ sudo sacct -j 386 --format 
JobID,JobName,AllocNodes,TotalCPU,CPUTime,NCPUS,CPUTimeRaw,AllocCPUs
       JobID    JobName AllocNodes   TotalCPU    CPUTime      NCPUS CPUTimeRAW  
AllocCPUS
------------ ---------- ---------- ---------- ---------- ---------- ---------- 
----------
386          fact_job.+          2  00:49.345   00:00:00          0          0  
        0
386.0          hostname          2  00:00.006   00:00:00          0          0  
        0
386.1        fact-sum.g          2  00:49.338   00:00:00          0          0  
        0

For the same job, the record in AccountingStorageLoc is:
[jecoulte@headnode ~]$ grep ^386 /var/log/slurmacct.log
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006538 1000 1000 - - 1 0 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 hostname compute-[0-1] 0 
0 0 0 (null) 4294967295
386 low 1517006536 1517006538 1000 1000 - - 1 0 3 0 2 2 0 0 6466 0 5388 0 1078 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 
hostname compute-[0-1] 1 1 1 1 (null) 4294967295
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006538 1000 1000 - - 1 1 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 fact-sum.g compute-[0-1] 
0 0 0 0 (null) 4294967295
386 low 1517006536 1517006565 1000 1000 - - 1 1 3 0 2 2 27 49 338902 48 94477 1 
244425 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 
0.00 fact-sum.g compute-[0-1] 1 1 1 1 (null) 4294967295
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006565 1000 1000 - - 3 28 3 4294967295 0

Reply via email to