Re: [slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

Paul Edmon Thu, 09 Jul 2020 12:19:23 -0700

Try setting RawShares to something greater than 1. I've seen it be thecase then when you set 1 it creates weirdness like this.


-Paul Edmon-


On 7/9/2020 1:12 PM, Dumont, Joey wrote:

Hi,
We recently set up fair tree scheduling (we have 19.05 running), andare trying to use sshare to see usage information. Unfortunately,sshare reports all zeros, even though there seems to be data in thebackend DB. Here's an example output:
$ sshare -l
Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins-------------------- ---------- ---------- ----------- ---------------------- ------------- ---------- ---------------------------------------- ------------------------------root 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ covid 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+covid-01 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+covid-02 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ group1 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+subgroup1 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ othersubgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+subgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+subgroups 4 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+subgroups 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ SUBGROUP 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+ SUBGROUP 1 0 0.000000 0.000000 cpu=0,mem=0,energy=0,node=0,b+
And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
FastSchedule=1

SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=600,bf_window=2880,bf_max_job_test=5000,bf_max_job_part=1000,bf_max_job_user=10,bf_max_job_start=100

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=100000
PriorityWeightAge=1000
PriorityWeightPartition=10000
PriorityWeightJobSize=1000
PriorityMaxAge=1-0

# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=hn2
AccountingStorageTRES=gres/gpu

# COMPUTE NODES
NodeName=cn[101-136] Procs=32 Gres=gpu:4 RealMemory=192782

# Partitions
PartitionName=JobTesting Nodes=cn[135-136] MaxTime=02:00:00DefaultTime=00:30:00 MaxMemPerNode=192782AllowGroups=DT-AI4DCluster-All State=UPPartitionName=TrixieMain Nodes=cn[106-134] MaxTime=48:00:00DefaultTime=08:00:00 MaxMemPerNode=192782AllowGroups=DT-AI4DCluster-All State=UP Default=YESPartitionName=ItOpsTests Nodes=cn[102-105] MaxTime=INFINITEMaxMemPerNode=192782 AllowGroups=Admin-Access,Manager-Access State=UPPartitionName=ItOpsImage Nodes=cn101 MaxTime=INFINITEMaxMemPerNode=192782 AllowGroups=Admin-Access State=UP
Anything that would explain sshare returns only zeros?
The only particularity I can think of is that I don't think wereloaded slurmctld, but just reconfigured.
Cheers,


Joey Dumont

Technical Advisor, Knowledge, Information, and Technology Services
National Research Council Canada / Governement of Canada
joey.dum...@nrc-cnrc.gc.ca <mailto:joey.dum...@nrc-cnrc.gc.ca> / Tel:613-990-8152 / Cell: 438-340-7436
Conseiller technique, Services du savoir, de l'information et de latechnologie
Conseil national de recherches Canada / Gouvernement du Canada
joey.dum...@nrc-cnrc.gc.ca <mailto:joey.dum...@nrc-cnrc.gc.ca> / Tél.:613-990-8152 / Tél. cell.: 438-340-7436

Re: [slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

Reply via email to