Hi, I'm trying to get lustre , and rdma setup on an el8 system. I can't get systemd to get the two services: lnet, and rdma shutdown correctly without hanging the system. I've tried many things in the rdma.service, and lnet.service files to get them to work correctly but still the issue exists. Here are my service files below. Anyone know how to fix this? Even with the service files set as below, the system hangs because the Mellanox drivers are attempted to be removed before lnet is stopped first. I get the messages:
- mlx4_core .... mlx4_shutdown was called - LNetError: 131-3: Received notification of device removal - please shutdown LNET to allow this to proceed --------- [Unit] Description=lnet management Requires=network-online.target After=network-online.target rdma.service Wants=rdma.service ConditionPathExists=!/proc/sys/lnet/ [Service] Type=oneshot RemainAfterExit=true ExecStart=/sbin/modprobe lnet ExecStart=/usr/sbin/lnetctl lnet configure ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf ExecStop=/usr/sbin/lnetctl lnet unconfigure ExecStop=/usr/sbin/lustre_rmmod TimeoutStopSec=30 [Install] WantedBy=multi-user.target --------- [Unit] Description=Initialize the iWARP/InfiniBand/RDMA stack in the kernel Documentation=file:/etc/rdma/rdma.conf RefuseManualStop=true DefaultDependencies=false Conflicts=emergency.target emergency.service Before=network.target remote-fs-pre.target lnet.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/usr/libexec/rdma-init-kernel [Install] WantedBy=sysinit.target ------ Thanks. Best, Chris -- Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
