Hi Everyone,

I'm a new to slurm administration and looking for a bit of help!

Just added Accounting to an existing cluster but job information is not being 
added to the Accounting Mariadb. When I submit a test job it gets scheduled 
fine and its visible with squeue, I get nothing returned from sacct!

I have turned up the logging to debug5 on both slurmctld and slurmdbd logs and 
can't see any errors. I believe all the comms are ok between slurmctld and 
slurmdbd as when I enter the sacct command I can see the database is being 
queried but returning nothing, because nothing has been added to the tables. 
The cluster tables were created fine when I ran

#sacctmgr add cluster ny5ktt

$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------

# tail -f slurmdbd.log
[2024-10-17T12:34:45.232] debug:  REQUEST_PERSIST_INIT: CLUSTER:ny5ktt 
VERSION:9216 UID:10001 IP:10.202.233.117 CONN:10
[2024-10-17T12:34:45.232] debug2: accounting_storage/as_mysql: 
acct_storage_p_get_connection: acct_storage_p_get_connection: request new 
connection 1
[2024-10-17T12:34:45.233] debug2: Attempting to connect to localhost:3306
[2024-10-17T12:34:45.274] debug2: DBD_GET_JOBS_COND: called
[2024-10-17T12:34:45.317] debug2: DBD_FINI: CLOSE:1 COMMIT:0
[2024-10-17T12:34:45.317] debug4: accounting_storage/as_mysql: 
acct_storage_p_commit: got 0 commits

The Mariadb is running on it own node with slurmdbd and munged for 
authentication. I haven't setup any accounts, users, asssociations or 
enforcements yet. On my lab cluster, jobs were visible in the database without 
these being setup. I guess I must be missing something simple in the config 
that is stopping jobs being reported to slurmdbd.

Master Node packages
# rpm -qa |grep slurm
slurm-slurmdbd-20.11.9-1.el8.x86_64
slurm-libs-20.11.9-1.el8.x86_64
slurm-20.11.9-1.el8.x86_64
slurm-slurmd-20.11.9-1.el8.x86_64
slurm-perlapi-20.11.9-1.el8.x86_64
slurm-doc-20.11.9-1.el8.x86_64
slurm-contribs-20.11.9-1.el8.x86_64
slurm-slurmctld-20.11.9-1.el8.x86_64

Database Node packages
# rpm -qa |grep slurm
slurm-slurmdbd-20.11.9-1.el8.x86_64
slurm-20.11.9-1.el8.x86_64
slurm-libs-20.11.9-1.el8.x86_64
slurm-devel-20.11.9-1.el8.x86_64

slurm.conf
#
# See the slurm.conf man page for more information.
#
ClusterName=ny5ktt
ControlMachine=ny5-pr-kttslurm-01
ControlAddr=10.202.233.71
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
MailProg=/bin/true
MaxJobCount=200000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm/d
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurm/ctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
#MinJobAge=300
#MinJobAge=43200
# CHG0057915
MinJobAge=14400
# CHG0057915
#MaxJobCount=50000
#MaxJobCount=100000
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
DefMemPerCPU=3000
#FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
#SelectTypeParameters=CR_Core
#SelectTypeParameters=CR_CPU
SelectTypeParameters=CR_CPU_Memory
# ECR CHG0056915 10/14/2023
MaxArraySize=5001
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageEnforce=limits
AccountingStorageHost=ny5-pr-kttslurmdb-01.ktt.schonfeld.com
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
#AccountingStorageType=accounting_storage/none
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageUser=
AccountingStoreJobComment=YES
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=60
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
#SlurmdLogFile=
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
##using fqdn since the ctld domain is different. Can't use regex since it's not 
at the end
##save 17 and 18 as headnodes
#NodeName=ny5-dv-kttres-17 Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 
Feature=HyperThread RealMemory=102400
#NodeName=ny5-dv-kttres-18 Sockets=1 CoresPerSocket=14 ThreadsPerCore=2 
Feature=HyperThread RealMemory=102400
NodeName=ny5-dv-kttres-19 Sockets=1 CoresPerSocket=12 ThreadsPerCore=2 
Feature=HyperThread RealMemory=102400
NodeName=ny5-dv-kttres-[20-21] Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 
Feature=HyperThread RealMemory=102400
NodeName=ny5-dv-kttres-[01-16] Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 
Feature=HyperThread RealMemory=233472
NodeName=ny5-dv-kttres-[22-35] Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 
Feature=HyperThread RealMemory=346884
PartitionName=ktt_slurm_light_1 Nodes=ny5-dv-kttres-[19-21] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_medium_1 Nodes=ny5-dv-kttres-[01-08] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_medium_2 Nodes=ny5-dv-kttres-[09-16] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_medium_3 Nodes=ny5-dv-kttres-[22-28] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_medium_4 Nodes=ny5-dv-kttres-[29-35] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_large_1 Nodes=ny5-dv-kttres-[01-16] Default=YES 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2
PartitionName=ktt_slurm_large_2 Nodes=ny5-dv-kttres-[22-35] Default=NO 
MaxTime=INFINITE State=UP OverSubscribe=FORCE:2

Slurmdbd.conf
AuthType=auth/munge
DbdAddr=10.202.233.72
DbdHost=ny5-pr-kttslurmdb-01
DebugLevel=debug5
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/tmp/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
#StorageHost=10.234.132.57
StorageUser=slurm
SlurmUser=slurm
StoragePass=xxxxxxx
#StorageUser=slurm
#StorageLoc=slurm_acct_db

Database tables

MariaDB [slurm_acct_db]> show tables;
+--------------------------------+
| Tables_in_slurm_acct_db        |
+--------------------------------+
| acct_coord_table               |
| acct_table                     |
| clus_res_table                 |
| cluster_table                  |
| convert_version_table          |
| federation_table               |
| ny5ktt_assoc_table             |
| ny5ktt_assoc_usage_day_table   |
| ny5ktt_assoc_usage_hour_table  |
| ny5ktt_assoc_usage_month_table |
| ny5ktt_event_table             |
| ny5ktt_job_table               |
| ny5ktt_last_ran_table          |
| ny5ktt_resv_table              |
| ny5ktt_step_table              |
| ny5ktt_suspend_table           |
| ny5ktt_usage_day_table         |
| ny5ktt_usage_hour_table        |
| ny5ktt_usage_month_table       |
| ny5ktt_wckey_table             |
| ny5ktt_wckey_usage_day_table   |
| ny5ktt_wckey_usage_hour_table  |
| ny5ktt_wckey_usage_month_table |
| qos_table                      |
| res_table                      |
| table_defs_table               |
| tres_table                     |
| txn_table                      |
| user_table                     |
+--------------------------------+

Many Thanks

Adrian



Disclaimer

Schonfeld Strategic Advisors (UK) LLP (“SSA UK”) is authorised and regulated by 
The Financial Conduct Authority. SSA UK is a limited liability partnership in 
England and Wales (No: OC420598) and its registered office is at 54 Jermyn 
Street, London, SW1Y 6LX. The contents of this message, including any 
attachments, are meant solely for the intended recipient and may be 
confidential, privileged, or otherwise protected from disclosure. If you 
receive this message in error, immediately alert the sender by reply e-mail, 
delete it and any attachments or copies from your systems, and do not read, 
disclose, distribute, or otherwise use the information contained herein. We do 
not waive any confidentiality or privilege if this message was misdirected. 
This e-mail does not constitute an offer to sell or a solicitation to buy any 
securities or an offer of any investment advisory services. If you reply to 
this email please note that we invest in securities and do not want to receive 
material, non-public information and you are instructed not to communicate any 
such information to us. We do not agree to keep confidential any information 
you provide nor restrict our trading activity, except as agreed pursuant to a 
written confidentiality agreement duly executed by us. We reserve the right to 
monitor and review the content of all messages sent to or from this e-mail 
address.
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to