You'll notice the fix is in the system configuration rather than in Pacemaker itself and what the fix does effectively is choose which of the possibilities to go with. Just as an example, I'm definitely *not* recommending this, but you could probably also have, say, modified pacemaker.service to put the cluster in maintenance mode before it is stopped.
It is more important that I say that /usr/lib/systemd/system/resource-agents-deps.target is owned by the resource-agents package, and if you're going to do what you said as your solution, you should *not* modify the file in /usr/lib, but rather do it as an actual drop-in as described in what Oyvind linked, i.e. in a .conf file with the first line "[Unit]" in the /etc/systemd/system/resource-agents-deps.target.d/ directory. ________________________________________ Od: chenzu...@gmail.com <chenzu...@gmail.com> Poslano: petek, 14. marec 2025 23:06 Za: Laura Hild Kp: lustre-discuss Zadeva: Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker Thank you for your advice. A user named Oyvind replied on the us...@clusterlabs.org mailing list: You need the systemd drop-in functionality introduced in RHEL 9.3 to avoid this issue: https://bugzilla.redhat.com/show_bug.cgi?id=2184779 The reason I understand is as follows: During reboot, both the system and Pacemaker will unmount the Lustre resource simultaneously. If the system unmounts first and Pacemaker unmounts afterward, Pacemaker will immediately return success. However, at this point, the system's unmounting process is not yet complete, causing Pacemaker to mount on the target end, which triggers this issue. My current modification is as follows: Add the following lines to the file `/usr/lib/systemd/system/resource-agents-deps.target`: ``` After=remote-fs.target Before=shutdown.target reboot.target halt.target ``` After making this modification, the issue no longer occurs during reboot. ________________________________ chenzu...@gmail.com From: Laura Hild<mailto:l...@jlab.org> Date: 2025-03-06 06:12 To: chenzu...@gmail.com<mailto:chenzu...@gmail.com> CC: lustre-discuss<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I virtually never try to (cleanly) reboot a host from which I have not already evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker in maintenance mode and unmounting/exporting everything manually. If I can't evacuate all resources and complete a lustre_rmmod, the host is getting power-cycled. So maybe I can say, my guess would be that in the host's shutdown process, stopping the Pacemaker service happens before filesystems are unmounted, and that Pacemaker doesn't want to make an assumption whether its own shut-down means it should standby or initiate maintenance mode, and therefore the other host ends up knowing only that its partner has disappeared, while the filesystems have yet to be unmounted. _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org