Hi Everyone, I'm a new to slurm administration and looking for a bit of help!
Just added Accounting to an existing cluster but job information is not being added to the Accounting Mariadb. When I submit a test job it gets scheduled fine and its visible with squeue, I get nothing returned from sacct! I have turned up the logging to debug5 on both slurmctld and slurmdbd logs and can't see any errors. I believe all the comms are ok between slurmctld and slurmdbd as when I enter the sacct command I can see the database is being queried but returning nothing, because nothing has been added to the tables. The cluster tables were created fine when I ran #sacctmgr add cluster ny5ktt $ sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- # tail -f slurmdbd.log [2024-10-17T12:34:45.232] debug: REQUEST_PERSIST_INIT: CLUSTER:ny5ktt VERSION:9216 UID:10001 IP: CONN:10 [2024-10-17T12:34:45.232] debug2: accounting_storage/as_mysql: acct_storage_p_get_connection: acct_storage_p_get_connection: request new connection 1 [2024-10-17T12:34:45.233] debug2: Attempting to connect to localhost:3306 [2024-10-17T12:34:45.274] debug2: DBD_GET_JOBS_COND: called [2024-10-17T12:34:45.317] debug2: DBD_FINI: CLOSE:1 COMMIT:0 [2024-10-17T12:34:45.317] debug4: accounting_storage/as_mysql: acct_storage_p_commit: got 0 commits The Mariadb is running on it own node with slurmdbd and munged for authentication. I haven't setup any accounts, users, asssociations or enforcements yet. On my lab cluster, jobs were visible in the database without these being setup. I guess I must be missing something simple in the config that is stopping jobs being reported to slurmdbd. Master Node packages # rpm -qa |grep slurm slurm-slurmdbd-20.11.9-1.el8.x86_64 slurm-libs-20.11.9-1.el8.x86_64 slurm-20.11.9-1.el8.x86_64 slurm-slurmd-20.11.9-1.el8.x86_64 slurm-perlapi-20.11.9-1.el8.x86_64 slurm-doc-20.11.9-1.el8.x86_64 slurm-contribs-20.11.9-1.el8.x86_64 slurm-slurmctld-20.11.9-1.el8.x86_64 Database Node packages # rpm -qa |grep slurm slurm-slurmdbd-20.11.9-1.el8.x86_64 slurm-20.11.9-1.el8.x86_64 slurm-libs-20.11.9-1.el8.x86_64 slurm-devel-20.11.9-1.el8.x86_64 slurm.conf # # See the slurm.conf man page for more information. # ClusterName=ny5ktt ControlMachine=ny5-pr-kttslurm-01 ControlAddr= #BackupController= #BackupAddr= # AuthType=auth/munge #CheckpointType=checkpoint/none CryptoType=crypto/munge #DisableRootJobs=NO #EnforcePartLimits=NO #Epilog= #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 #JobCheckpointDir=/var/slurm/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins= #KillOnBadExit=0 #LaunchType=launch/slurm #Licenses=foo*4,bar MailProg=/bin/true MaxJobCount=200000 #MaxStepCount=40000 #MaxTasksPerNode=128 MpiDefault=none #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/cgroup #Prolog= #PrologFlags= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #RebootProgram= ReturnToService=1 #SallocDefaultCommand= SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurm/d SlurmUser=slurm #SlurmdUser=root #SrunEpilog= #SrunProlog= StateSaveLocation=/var/spool/slurm/ctld SwitchType=switch/none #TaskEpilog= TaskPlugin=task/none #TaskPluginParam= #TaskProlog= #TopologyPlugin=topology/tree #TmpFS=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 #MinJobAge=300 #MinJobAge=43200 # CHG0057915 MinJobAge=14400 # CHG0057915 #MaxJobCount=50000 #MaxJobCount=100000 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING DefMemPerCPU=3000 #FastSchedule=1 #MaxMemPerCPU=0 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SelectType=select/cons_tres #SelectTypeParameters=CR_Core #SelectTypeParameters=CR_CPU SelectTypeParameters=CR_CPU_Memory # ECR CHG0056915 10/14/2023 MaxArraySize=5001 # # # JOB PRIORITY #PriorityFlags= #PriorityType=priority/basic #PriorityDecayHalfLife= #PriorityCalcPeriod= #PriorityFavorSmall= #PriorityMaxAge= #PriorityUsageResetPeriod= #PriorityWeightAge= #PriorityWeightFairshare= #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 #AccountingStorageEnforce=limits AccountingStorageHost=ny5-pr-kttslurmdb-01.ktt.schonfeld.com #AccountingStorageLoc= #AccountingStoragePass= #AccountingStoragePort= #AccountingStorageType=accounting_storage/none AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageUser= AccountingStoreJobComment=YES #DebugFlags= #JobCompHost= #JobCompLoc= #JobCompPass= #JobCompPort= JobCompType=jobcomp/none #JobCompUser= #JobContainerType=job_container/none JobAcctGatherFrequency=60 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=info SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurm/slurmd.log #SlurmdLogFile= #SlurmSchedLogFile= #SlurmSchedLogLevel= # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES ##using fqdn since the ctld domain is different. Can't use regex since it's not at the end ##save 17 and 18 as headnodes #NodeName=ny5-dv-kttres-17 Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400 #NodeName=ny5-dv-kttres-18 Sockets=1 CoresPerSocket=14 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400 NodeName=ny5-dv-kttres-19 Sockets=1 CoresPerSocket=12 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400 NodeName=ny5-dv-kttres-[20-21] Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 Feature=HyperThread RealMemory=102400 NodeName=ny5-dv-kttres-[01-16] Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Feature=HyperThread RealMemory=233472 NodeName=ny5-dv-kttres-[22-35] Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 Feature=HyperThread RealMemory=346884 PartitionName=ktt_slurm_light_1 Nodes=ny5-dv-kttres-[19-21] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_medium_1 Nodes=ny5-dv-kttres-[01-08] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_medium_2 Nodes=ny5-dv-kttres-[09-16] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_medium_3 Nodes=ny5-dv-kttres-[22-28] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_medium_4 Nodes=ny5-dv-kttres-[29-35] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_large_1 Nodes=ny5-dv-kttres-[01-16] Default=YES MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 PartitionName=ktt_slurm_large_2 Nodes=ny5-dv-kttres-[22-35] Default=NO MaxTime=INFINITE State=UP OverSubscribe=FORCE:2 Slurmdbd.conf AuthType=auth/munge DbdAddr= DbdHost=ny5-pr-kttslurmdb-01 DebugLevel=debug5 LogFile=/var/log/slurm/slurmdbd.log PidFile=/tmp/slurmdbd.pid StorageType=accounting_storage/mysql StorageHost=localhost #StorageHost= StorageUser=slurm SlurmUser=slurm StoragePass=xxxxxxx #StorageUser=slurm #StorageLoc=slurm_acct_db Database tables MariaDB [slurm_acct_db]> show tables; +--------------------------------+ | Tables_in_slurm_acct_db | +--------------------------------+ | acct_coord_table | | acct_table | | clus_res_table | | cluster_table | | convert_version_table | | federation_table | | ny5ktt_assoc_table | | ny5ktt_assoc_usage_day_table | | ny5ktt_assoc_usage_hour_table | | ny5ktt_assoc_usage_month_table | | ny5ktt_event_table | | ny5ktt_job_table | | ny5ktt_last_ran_table | | ny5ktt_resv_table | | ny5ktt_step_table | | ny5ktt_suspend_table | | ny5ktt_usage_day_table | | ny5ktt_usage_hour_table | | ny5ktt_usage_month_table | | ny5ktt_wckey_table | | ny5ktt_wckey_usage_day_table | | ny5ktt_wckey_usage_hour_table | | ny5ktt_wckey_usage_month_table | | qos_table | | res_table | | table_defs_table | | tres_table | | txn_table | | user_table | +--------------------------------+ Many Thanks Adrian Disclaimer Schonfeld Strategic Advisors (UK) LLP (“SSA UK”) is authorised and regulated by The Financial Conduct Authority. 