Slurm Users, I am hoping that you all can help me with the problem below.
We just spun up a new cluster using Bright and have been trying to change the default behavior of slurm from linear to con_res. Should be simple enough but I am plagued by the following error: error: we don't have select plugin type 102 Both the select_linear.so and select_cons_res.so are located in /cm/shared_tmp/apps/slurm/17.11.8/lib64/slurm/ I have been testing with just the compute nodes and not the GPU nodes etc... I added the following to my slurm.conf file: # Scheduler SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core # Nodes # NodeName=big-mem[001-005],node[001-056] # Entry from default install # NodeName=gpu[001-004] Gres=gpu:2 # Entry from default install NodeName=node[001-056] CPUs=2 RealMemory=196000 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN # Partitions PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpu[001-004],big-mem[001-005],node[001-056] PartitionName=test Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-056] When I issue the scontrol reconfigure I get the following: [root@thunder ~]# scontrol reconfigure slurm_reconfigure error: Unable to contact slurm controller (connect failure) [root@thunder ~]# systemctl status slurmctld.service ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2018-12-13 08:46:18 CST; 5s ago Process: 31416 ExecStart=/cm/shared/apps/slurm/17.11.8/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 31418 (code=exited, status=1/FAILURE) When I revert the changes, it goes back to an active working state. The /var/log/slurmctld log shows this erorr message: error: we don't have select plugin type 102 Has anyone else run into this problem? If so, can you recommend a fix? Thanks, Chad