Hello, I've just finished building and installing Slurm 22.05.6 from source on a head node and a couple workers. I installed the same RPMs on all the nodes, and the slurmdbd, slurmctld, and slurmd daemons have all come online and appear healthy (test jobs can be submitted to partitions and successfully run on the nodes). But I'm seeing these errors at regular intervals in the slurm logs:
[2022-11-29T11:29:49.683] error: unpack_header: protocol_version 8960 not supported [2022-11-29T11:29:49.683] error: unpacking header [2022-11-29T11:29:49.683] error: destroy_forward: no init [2022-11-29T11:29:49.684] error: slurm_receive_msg_and_forward: [[sdc-uk]:53026] failed: Message receive failure [2022-11-29T11:29:49.694] error: service_connection: slurm_receive_msg: Message receive failure My slurm.conf is based on my previous (still existing) cluster config, and I've already encountered one or two issues with plugins not working. I can't find anything online listing the Slurm protocol_version numbers to check what is causing this error, though I'm assuming it's plugin related (slurmdbd maybe?). Turning up the debugging on the slurm logs doesn't help at finding the issue. Does anyone here know what protocol_verson 8960 relates to? Relevant slurm.conf lines are: MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=2 SlurmUser=slurm StateSaveLocation=/var/spool/slurm/slurmctld SwitchType=switch/none TaskPlugin=task/affinity,task/cgroup # Job cleanup Epilog=/etc/slurm/slurm.epilog.clean UnkillableStepTimeout=120 UnkillableStepProgram=/root/unkillableJobStepScript.sh # SCHEDULING #FastSchedule=0 SchedulerType=sched/backfill SchedulerParameters=nohold_on_prolog_fail SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory PriorityType=priority/multifactor PriorityWeightPartition=1000 PreemptMode=SUSPEND,GANG PreemptType=preempt/partition_prio # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/slurmdbd JobCompType=jobcomp/none JobAcctGatherFrequency=40 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=5 SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=5 SlurmdLogFile=/var/log/slurm/slurmd.log Cheers, Mark ------------------------------- Mark Holliman Senior Data Systems Specialist Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh -------------------------------- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.