Hi All, I've run into a strange problem with my slurm configuration. Trying to set up AccountingStorage properly so that I can use OpenXDMoD for producing usage reports, but the output I'm getting from sacct only has 0's for a huge number of fields like NCPUs and CPUTimeRaw (which are rather important for useage reports).
Has anyone here run into something similar before? It would be great if someone could point out what I've mis-configured. I've pasted the relevant bits of my slurm config and sacct output after my sig. Thanks! ------------------------------------ Eric Coulter jecou...@iu.edu XSEDE Capabilities and Resource Integration Engineer IU Campus Bridging & Research Infrastructure RT/PTI/UITS 812-856-3250 jecoulte@headnode ~]$ scontrol show config | grep Acc AccountingStorageBackupHost = (null) AccountingStorageEnforce = none AccountingStorageHost = headnode AccountingStorageLoc = /var/log/slurmacct.log AccountingStoragePort = 0 AccountingStorageTRES = cpu,mem,energy,node #Added these in case the default wasn't being respected for some reason... AccountingStorageType = accounting_storage/filetxt AccountingStorageUser = root AccountingStoreJobComment = Yes AcctGatherEnergyType = acct_gather_energy/none AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType = acct_gather_infiniband/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none JobAcctGatherFrequency = 30 JobAcctGatherType = jobacct_gather/linux JobAcctGatherParams = (null)? For a job running on 2 nodes, 1 cpu per node, sacct shows: [jecoulte@headnode ~]$ sudo sacct -j 386 --format JobID,JobName,AllocNodes,TotalCPU,CPUTime,NCPUS,CPUTimeRaw,AllocCPUs JobID JobName AllocNodes TotalCPU CPUTime NCPUS CPUTimeRAW AllocCPUS ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- 386 fact_job.+ 2 00:49.345 00:00:00 0 0 0 386.0 hostname 2 00:00.006 00:00:00 0 0 0 386.1 fact-sum.g 2 00:49.338 00:00:00 0 0 0 For the same job, the record in AccountingStorageLoc is: [jecoulte@headnode ~]$ grep ^386 /var/log/slurmacct.log 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006538 1000 1000 - - 1 0 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 hostname compute-[0-1] 0 0 0 0 (null) 4294967295 386 low 1517006536 1517006538 1000 1000 - - 1 0 3 0 2 2 0 0 6466 0 5388 0 1078 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 hostname compute-[0-1] 1 1 1 1 (null) 4294967295 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006538 1000 1000 - - 1 1 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 fact-sum.g compute-[0-1] 0 0 0 0 (null) 4294967295 386 low 1517006536 1517006565 1000 1000 - - 1 1 3 0 2 2 27 49 338902 48 94477 1 244425 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 fact-sum.g compute-[0-1] 1 1 1 1 (null) 4294967295 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006565 1000 1000 - - 3 28 3 4294967295 0