Hey Ryan,


I have restarted the slurmctld and slurmd services several times. I hashed the 
slurm.conf files. They are the same. I ran "sinfo -a" as root with the same 
result.



Thanks,

Kent



From: Ryan Novosielski <novos...@rutgers.edu>
Sent: Wednesday, November 27, 2024 9:31 AM
To: Kent L. Hanson <kent.han...@inl.gov>
Cc: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] sinfo not listing any partitions



If you're sure you've restarted everything after the config change, are you 
also sure that you don't have that stuff hidden from your current user? You can 
try -a to rule that out. Or run as root.



--
#BlackLivesMatter

____
|| \\UTGERS<file://UTGERS>,     
|---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - 
novos...@rutgers.edu<mailto:novos...@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'





   On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users 
<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> wrote:



   I am doing a new install of slurm 24.05.3 I have all the packages built and 
installed on headnode and compute node with the same munge.key, slurm.conf, and 
gres.conf file. I was able to run munge and unmunge commands to test munge 
successfully. Time is synced with chronyd. I can't seem to find any useful 
errors in the logs. For some reason when I run sinfo no nodes are listed. I 
just see the headers for each column. Has anyone seen this or know what a next 
step of troubleshooting would be? I'm new to this and not sure where to go from 
here. Thanks for any and all help!



   The odd output I am seeing

   [username@headnode ~] sinfo

   PARTITION AVAIL    TIMELIMIT NODES   STATE   NODELIST



   (Nothing is output showing status of partition or nodes)





   Slurm.conf



   ClusterName=slurmkvasir

   SlurmctldHost=kadmin2

   MpiDefault=none

   ProctrackType=proctrack/cgroup

   PrologFlags=contain

   ReturnToService=2

   SlurmctldPidFile=/var/run/slurm/slurmctld.pid

   SlurmctldPort=6817

   SlurmPidFile=/var/run/slurm/slurmd.pid

   SlurmdPort=6818

   SlurmdSpoolDir=/var/spool/slurmd

   SlurmUser=slurm

   StateSaveLocation=/var/spool/slurmctld

   TaskPlugin=task/cgroup

   MinJobAge=600

   SchedulerType=sched/backfill

   SelectType=select/cons_tres

   PriorityType=priority/multifactor

   AccountingStorageHost=localhost

   AccountingStoragePass=/var/run/munge/munge.socket.2

   AccountingStorageType=accounting_storage/slurmdbd

   AccountingStorageTRES=gres/gpu,cpu,node

   JobCompType=jobcomp/none

   JobAcctGatherFrequency=30

   JobAcctGatherType=jobacct_gather/cgroup

   SlurmctldDebug=info

   SlurmctldLogFile=/var/log/slurm/slurmctld.log

   SlurmdDebug=info

   SlurmLogFile=/var/log/slurm/slurmd.log

   nodeName=k[001-448]

   PartitionName=default Nodes=k[001-448] Default=YES MaxTime=INFINITE State=up



   Slurmctld.log



   Error: Configured MailProg is invalid

   Slurmctld version 24.05.3 started on cluster slurmkvasir

   Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: 
Regisetering slurmctld at port 8617

   Error: read_slurm_conf: default partition not set.

   Revovered state of 448 nodes

   Down nodes: k[002-448]

   Recovered information about 0 jobs

   Revovered state of 0 reservations

   Read_slurm_conf: backup_controller not specified

   Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure

   Running as primary controller



   Slurmd.log



   Error: Node configuration differs from hardware: CPUS=1:40(hw) 
Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw) 
ThreadsPerCore:1:1(hw)

   CPU frequency setting not configured for this node

   Slurmd version 24.05.3started

   Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700

   CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201 uptime 166740 
CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)

   Error: _forward_thread: failed to k019 (10.142.0.119:6818): Connection timed 
out

   (Above line repeated 20 or so times for different nodes.)



   Thanks,

   Kent Hanson


   --
   slurm-users mailing list -- 
slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
   To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com<mailto:slurm-users-le...@lists.schedmd.com>



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to