You only have one partition named 'default'
You are not allowed to name it that. Name it something else and you should be good.

Brian Andrus

On 11/28/2024 6:52 AM, Patrick Begou via slurm-users wrote:
Hi Kent,

on your management node could you run:
systemctl status slurmctld

and check your 'Nodename=....' and 'PartitionName=...' in /etc/slurm.conf ? In my slurm.conf I have a more detailed description and the Nodename Keyword start with an upper case (do'nt know if slurm.conf is case sensitive) :

NodeName=kareline-0-[0-3]  Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=47900

and it looks like your nodes description is not understood by slurm.

Patrick


Le 27/11/2024 à 17:46, Ryan Novosielski via slurm-users a écrit :
At this point, I’d probably crank up the logging some and see what it’s saying in slurmctld.log.

--
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ | Office of Advanced Research Computing - MSB A555B, Newark
     `'

On Nov 27, 2024, at 11:38, Kent L. Hanson <kent.han...@inl.gov> wrote:

Hey Ryan,
I have restarted the slurmctld and slurmd services several times. I hashed the slurm.conf files. They are the same. I ran “sinfo -a” as root with the same result.
Thanks,

Kent
*From:*Ryan Novosielski <novos...@rutgers.edu>
*Sent:*Wednesday, November 27, 2024 9:31 AM
*To:*Kent L. Hanson <kent.han...@inl.gov>
*Cc:*slurm-users@lists.schedmd.com
*Subject:*Re: [slurm-users] sinfo not listing any partitions
If you’re sure you’ve restarted everything after the config change, are you also sure that you don’t have that stuff hidden from your current user? You can try -a to rule that out. Or run as root.
--
#BlackLivesMatter
____
|| \\UTGERS <file://utgers/>, |---------------------------*O*---------------------------
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ | Office of Advanced Research Computing - MSB A555B, Newark
     `'


    On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users
    <slurm-users@lists.schedmd.com> wrote:
    I am doing a new install of slurm 24.05.3 I have all the
    packages built and installed on headnode and compute node with
    the same munge.key, slurm.conf, and gres.conf file. I was able
    to run munge and unmunge commands to test munge successfully.
    Time is synced with chronyd. I can’t seem to find any useful
    errors in the logs. For some reason when I run sinfo no nodes
    are listed. I just see the headers for each column. Has anyone
    seen this or know what a next step of troubleshooting would be?
    I’m new to this and not sure where to go from here. Thanks for
    any and all help!
    The odd output I am seeing
    [username@headnode ~] sinfo
    PARTITION AVAIL    TIMELIMIT NODES   STATE NODELIST
    */(Nothing is output showing status of partition or nodes)/*
    Slurm.conf
    ClusterName=slurmkvasir
    SlurmctldHost=kadmin2
    MpiDefault=none
    ProctrackType=proctrack/cgroup
    PrologFlags=contain
    ReturnToService=2
    SlurmctldPidFile=/var/run/slurm/slurmctld.pid
    SlurmctldPort=6817
    SlurmPidFile=/var/run/slurm/slurmd.pid
    SlurmdPort=6818
    SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=slurm
    StateSaveLocation=/var/spool/slurmctld
    TaskPlugin=task/cgroup
    MinJobAge=600
    SchedulerType=sched/backfill
    SelectType=select/cons_tres
    PriorityType=priority/multifactor
    AccountingStorageHost=localhost
    AccountingStoragePass=/var/run/munge/munge.socket.2
    AccountingStorageType=accounting_storage/slurmdbd
    AccountingStorageTRES=gres/gpu,cpu,node
    JobCompType=jobcomp/none
    JobAcctGatherFrequency=30
    JobAcctGatherType=jobacct_gather/cgroup
    SlurmctldDebug=info
    SlurmctldLogFile=/var/log/slurm/slurmctld.log
    SlurmdDebug=info
    SlurmLogFile=/var/log/slurm/slurmd.log
    nodeName=k[001-448]
    PartitionName=default Nodes=k[001-448] Default=YES
    MaxTime=INFINITE State=up
    Slurmctld.log
    Error: Configured MailProg is invalid
    Slurmctld version 24.05.3 started on cluster slurmkvasir
    Accounting_storage/slurmdbd:
    clusteracct_storage_p_register_ctld: Regisetering slurmctld at
    port 8617
    Error: read_slurm_conf: default partition not set.
    Revovered state of 448 nodes
    Down nodes: k[002-448]
    Recovered information about 0 jobs
    Revovered state of 0 reservations
    Read_slurm_conf: backup_controller not specified
    Select/cons_tres; select_p_reconfigure: select/cons_tres:
    reconfigure
    Running as primary controller
    Slurmd.log
    Error: Node configuration differs from hardware: CPUS=1:40(hw)
    Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
    ThreadsPerCore:1:1(hw)
    CPU frequency setting not configured for this node
    Slurmd version 24.05.3started
    Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
    CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201
    uptime 166740 CPUSpecList=(null) FeaturesAvail=(null)
    FeaturesActive=(null)
    Error: _/forward/_thread: failed to k019 (10.142.0.119:6818):
    Connection timed out
    */(Above line repeated 20 or so times for different nodes.)/*
    *//*
    Thanks,

    Kent Hanson

    --
    slurm-users mailing list --slurm-users@lists.schedmd.com
    <mailto:slurm-users@lists.schedmd.com>
    To unsubscribe send an email
    toslurm-users-le...@lists.schedmd.com
    <mailto:slurm-users-le...@lists.schedmd.com>




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to