Hi,

I'm trying to get lustre , and rdma setup on an el8 system. I can't get systemd 
to get the two services: lnet, and rdma shutdown correctly without hanging the 
system. I've tried many things in the rdma.service, and lnet.service files to 
get them to work correctly but still the issue exists. Here are my service 
files below. Anyone know how to fix this? Even with the service files set as 
below, the system hangs because the Mellanox drivers are attempted to be 
removed before lnet is stopped first. I get the messages:

- mlx4_core .... mlx4_shutdown was called
- LNetError: 131-3: Received notification of device removal 
- please shutdown LNET to allow this to proceed

---------
[Unit]
Description=lnet management

Requires=network-online.target
After=network-online.target rdma.service
Wants=rdma.service

ConditionPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target
---------

[Unit]
Description=Initialize the iWARP/InfiniBand/RDMA stack in the kernel
Documentation=file:/etc/rdma/rdma.conf
RefuseManualStop=true
DefaultDependencies=false
Conflicts=emergency.target emergency.service
Before=network.target remote-fs-pre.target lnet.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/libexec/rdma-init-kernel

[Install]
WantedBy=sysinit.target
------

Thanks.

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to