On 2014-02-05T12:24:00, Ulrich Windl <[email protected]> wrote:
> I had a problem where "O2CB stop" fenced the node that was shut down:
> I had updated the kernel, and then rebooted. As part of shutdown, the cluster
> stack was stopped. In turn, the "O2CB" resource was stopped.
> Unfortunately this caused an error like (SLES11 SP3):
>
> ---
> modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No
> such file or directory
> o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2
> ---
>
> This in turn caused a node fence, which ruined the clean reboot.
>
> So why is the RA messing with the kernel module on stop?
Because customers complained about the new module not being picked up if
they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's
incredibly hard to please everyone, alas ...
The right way to update a cluster node is anyway this one:
1. Stop the cluster stack
2. Update/upgrade/reboot as needed
3. Restart the cluster stack
This would avoid this error too. Or keeping multiple kernel versions in
parallel (which also helps if a kernel update no longer boots for some
reason). Removing the running kernel package is usually not a great
idea; I prefer to remove them after having successfully rebooted only,
because you *never* know if you may have to reload a module.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems