It sounds like the new version was built with different options, and/or
an install was not done via packages.
If you do use rpms, you could try:
dnf provides /usr/lib64/slurm/mpi_none.so
If that shows a package that is installed, remove it. If it shows
nothing, move the file elsewhere and ensure slurmd is happier.
Brian Andrus
On 8/14/24 17:52, Sid Young via slurm-users wrote:
G'Day all,
I've been upgrading cmy cluster from 20.11.0 in small steps to get to
24.05.2. Currently 1 have all nodes on 23.02.8, the controller on
24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade
to Oracle Linux 8.10 is Phase 2 of the upgrades).
When I check the slurmd status on the test node I get:
[root@hpc-dev-01 24.05.2]# systemctl status slurmd
● slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled;
vendor preset: disabled)
Active: active (running) since Thu 2024-08-15 10:45:15 AEST; 24s ago
Main PID: 46391 (slurmd)
Tasks: 1
Memory: 1.2M
CGroup: /system.slice/slurmd.service
└─46391 /usr/sbin/slurmd --systemd
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each
NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Node reconfigured
socket/core boundaries SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each
NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd version
24.05.2 started
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd:
*plugin_load_from_file: Incompatible Slurm plugin
/usr/lib64/slurm/mpi_none.so version (23.02.8)*
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: Couldn't load
specified plugin name for mpi/none: Incompatible plugin version
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: MPI: Cannot
create context for mpi/none
Aug 15 10:45:15 hpc-dev-01 systemd[1]: Started Slurm node daemon.
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd started on
Thu, 15 Aug 2024 10:45:15 +1000
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: CPUs=64 Boards=1
Sockets=8 Cores=8 Threads=1 Memory=257778 TmpDisk=15998 Uptime=2898769
CPUSpecL...ve=(null)
Hint: Some lines were ellipsized, use -l to show in full.
[root@hpc-dev-01 24.05.2]#
We don't use MPI (life science workloads)... should I remove the file?
If it is version 23.02.8 then doesn't 24.05.2 have that plugin built
in? There are no references to mpi i the slurm.conf file
Sid
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com