Hi,

We recently set up fair tree scheduling (we have 19.05 running), and are trying 
to use sshare to see usage information. Unfortunately, sshare reports all 
zeros, even though there seems to be data in the backend DB. Here's an example 
output:


$ sshare -l
             Account       User  RawShares  NormShares    RawUsage   NormUsage  
EffectvUsage  FairShare    LevelFS                    GrpTRESMins               
     TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ---------- ------------------------------ 
------------------------------
root                                                             0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
 covid                                   1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  covid-01                               1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  covid-02                               1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
 group1                                  1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  subgroup1                              1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups                        1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  subgroups                              1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  subgroups                              4                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
  subgroups                              1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                                1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                                1                       0              
    0.000000              0.000000                                
cpu=0,mem=0,energy=0,node=0,b+



And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
FastSchedule=1

SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=600,bf_window=2880,bf_max_job_test=5000,bf_max_job_part=1000,bf_max_job_user=10,bf_max_job_start=100

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=100000
PriorityWeightAge=1000
PriorityWeightPartition=10000
PriorityWeightJobSize=1000
PriorityMaxAge=1-0

# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=hn2
AccountingStorageTRES=gres/gpu

# COMPUTE NODES
NodeName=cn[101-136] Procs=32 Gres=gpu:4 RealMemory=192782

# Partitions
PartitionName=JobTesting Nodes=cn[135-136] MaxTime=02:00:00 
DefaultTime=00:30:00 MaxMemPerNode=192782 AllowGroups=DT-AI4DCluster-All 
State=UP
PartitionName=TrixieMain Nodes=cn[106-134] MaxTime=48:00:00 
DefaultTime=08:00:00 MaxMemPerNode=192782 AllowGroups=DT-AI4DCluster-All 
State=UP Default=YES
PartitionName=ItOpsTests Nodes=cn[102-105] MaxTime=INFINITE 
MaxMemPerNode=192782 AllowGroups=Admin-Access,Manager-Access State=UP
PartitionName=ItOpsImage Nodes=cn101 MaxTime=INFINITE MaxMemPerNode=192782 
AllowGroups=Admin-Access State=UP

Anything that would explain sshare returns only zeros?


The only particularity I can think of is that I don't think we reloaded 
slurmctld, but just reconfigured.


Cheers,


Joey Dumont

Technical Advisor, Knowledge, Information, and Technology Services
National Research Council Canada / Governement of Canada
joey.dum...@nrc-cnrc.gc.ca<mailto:joey.dum...@nrc-cnrc.gc.ca> / Tel: 
613-990-8152 / Cell: 438-340-7436

Conseiller technique, Services du savoir, de l'information et de la technologie
Conseil national de recherches Canada / Gouvernement du Canada
joey.dum...@nrc-cnrc.gc.ca<mailto:joey.dum...@nrc-cnrc.gc.ca> / Tél.: 
613-990-8152 / Tél. cell.: 438-340-7436

Reply via email to