Re: [Linux-HA] Why does o2cb RA remove module ocfs2?

Lars Marowsky-Bree Wed, 05 Feb 2014 03:37:22 -0800

On 2014-02-05T12:24:00, Ulrich Windl <[email protected]> wrote:


> I had a problem where "O2CB stop" fenced the node that was shut down:
> I had updated the kernel, and then rebooted. As part of shutdown, the cluster 
> stack was stopped. In turn, the "O2CB" resource was stopped.
> Unfortunately this caused an error like (SLES11 SP3):
> 
> ---
> modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No 
> such file or directory
> o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2
> ---
> 
> This in turn caused a node fence, which ruined the clean reboot.
> 
> So why is the RA messing with the kernel module on stop?

Because customers complained about the new module not being picked up if
they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's
incredibly hard to please everyone, alas ...

The right way to update a cluster node is anyway this one:

1. Stop the cluster stack
2. Update/upgrade/reboot as needed
3. Restart the cluster stack

This would avoid this error too. Or keeping multiple kernel versions in
parallel (which also helps if a kernel update no longer boots for some
reason). Removing the running kernel package is usually not a great
idea; I prefer to remove them after having successfully rebooted only,
because you *never* know if you may have to reload a module.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Why does o2cb RA remove module ocfs2?

Reply via email to