Hello Ole,

I have no firewall on the computenodes and I have the internal interfaces on 
kadmin2, opa and eth, in the trusted zone of the firewall. It should allow 
everything through. I'm using RHEL 9.4. I built the rpm packages from source 
using the admin guide https://slurm.schedmd.com/quickstart_admin.html : 
https://slurm.schedmd.com/quickstart_admin.html

"Kadmin2" and "headnode" are the one and same. This system is on an air gapped 
network and I had to hand jam everything. Sorry for the confusion.

No luck stopping the firewall service. Still the same issue.

I'll continue to read the documentation that you have sent me and see if I 
missed anything.

Thanks,

Kent

-----Original Message-----
From: Ole Holm Nielsen via slurm-users <slurm-users@lists.schedmd.com>
Sent: Wednesday, November 27, 2024 8:47 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: sinfo not listing any partitions

Hi Kent,

This problem could perhaps be due to your firewall setup.  What is your OS, and 
did you install Slurm by RPM packages or what?

Does sinfo work on your SlurmctldHost=kadmin2?  Is the "headnode" a different 
host?  Try stopping the firewalld service.

You can see some advice on firewalls in the Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#configure-firewall-for-slurm-daemons
There is information about Slurm installation and configuration in the Wiki 
pages in https://wiki.fysik.dtu.dk/Niflheim_system/

IHTH,
Ole

On 11/27/24 15:56, Kent L. Hanson via slurm-users wrote:
> I am doing a new install of slurm 24.05.3 I have all the packages
> built and installed on headnode and compute node with the same
> munge.key, slurm.conf, and gres.conf file. I was able to run munge and
> unmunge commands to test munge successfully. Time is synced with
> chronyd. I can't seem to find any useful errors in the logs. For some
> reason when I run sinfo no nodes are listed. I just see the headers
> for each column. Has anyone seen this or know what a next step of
> troubleshooting would be? I'm new to this and not sure where to go from here. 
> Thanks for any and all help!
>
> The odd output I am seeing
>
> [username@headnode ~] sinfo
>
> PARTITION AVAIL    TIMELIMIT NODES   STATE   NODELIST
>
> */(Nothing is output showing status of partition or nodes) /*
>
> Slurm.conf
>
> ClusterName=slurmkvasir
>
> SlurmctldHost=kadmin2
>
> MpiDefault=none
>
> ProctrackType=proctrack/cgroup
>
> PrologFlags=contain
>
> ReturnToService=2
>
> SlurmctldPidFile=/var/run/slurm/slurmctld.pid
>
> SlurmctldPort=6817
>
> SlurmPidFile=/var/run/slurm/slurmd.pid
>
> SlurmdPort=6818
>
> SlurmdSpoolDir=/var/spool/slurmd
>
> SlurmUser=slurm
>
> StateSaveLocation=/var/spool/slurmctld
>
> TaskPlugin=task/cgroup
>
> MinJobAge=600
>
> SchedulerType=sched/backfill
>
> SelectType=select/cons_tres
>
> PriorityType=priority/multifactor
>
> AccountingStorageHost=localhost
>
> AccountingStoragePass=/var/run/munge/munge.socket.2
>
> AccountingStorageType=accounting_storage/slurmdbd
>
> AccountingStorageTRES=gres/gpu,cpu,node
>
> JobCompType=jobcomp/none
>
> JobAcctGatherFrequency=30
>
> JobAcctGatherType=jobacct_gather/cgroup
>
> SlurmctldDebug=info
>
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>
> SlurmdDebug=info
>
> SlurmLogFile=/var/log/slurm/slurmd.log
>
> nodeName=k[001-448]
>
> PartitionName=default Nodes=k[001-448] Default=YES MaxTime=INFINITE
> State=up
>
> Slurmctld.log
>
> Error: Configured MailProg is invalid
>
> Slurmctld version 24.05.3 started on cluster slurmkvasir
>
> Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld:
> Regisetering slurmctld at port 8617
>
> Error: read_slurm_conf: default partition not set.
>
> Revovered state of 448 nodes
>
> Down nodes: k[002-448]
>
> Recovered information about 0 jobs
>
> Revovered state of 0 reservations
>
> Read_slurm_conf: backup_controller not specified
>
> Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure
>
> Running as primary controller
>
> Slurmd.log
>
> Error: Node configuration differs from hardware: CPUS=1:40(hw)
> Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
> ThreadsPerCore:1:1(hw)
>
> CPU frequency setting not configured for this node
>
> Slurmd version 24.05.3started
>
> Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
>
> CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201 uptime
> 166740 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
>
> Error: _/forward/_thread: failed to k019 (10.142.0.119:6818):
> Connection timed out
>
> */(Above line repeated 20 or so times for different nodes.)/*

--
slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send 
an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to