On Fri, Dec 16, 2011 at 01:31:32PM +0100, Ulrich Windl wrote:
> Hi!
> 
> I have some troubel with OCFS on top of DRBD that seems to be timing-related:
> OCFS is working on the DRBD when DRBD itself wants to vhange something it 
> seems:
> 
> ...
> Dec 16 11:39:58 h06 kernel: [  122.426174] block drbd0: role( Secondary -> 
> Primary )
> Dec 16 11:39:58 h06 multipathd: drbd0: update path write_protect to '0' 
> (uevent)
> Dec 16 11:40:29 h06 ocfs2_controld: start_mount: uuid 
> "FD32E504527742CEA7DA6DB272D5D7B2", device "/dev/drbd_r0", service "ocfs2"
> ...
> Dec 16 11:40:29 h06 kernel: [  152.837615] block drbd0: peer( Secondary -> 
> Primary )
> Dec 16 11:40:29 h06 ocfs2_hb_ctl[19177]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -P 
> -d /dev/drbd_r0
> Dec 16 11:43:50 h06 kernel: [  354.559240] block drbd0: State change failed: 
> Device is held open by someone
> Dec 16 11:43:50 h06 kernel: [  354.559244] block drbd0:   state = { 
> cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate r----- }
> Dec 16 11:43:50 h06 kernel: [  354.559246] block drbd0:  wanted = { 
> cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate r----- }
> Dec 16 11:43:50 h06 drbd[28754]: [28786]: ERROR: r0: Called drbdadm -c 
> /etc/drbd.conf secondary r0

The resource agent was told to demote.
That fails, as DRBD is still/already in use (by ocfs2 or other).

> Dec 16 11:43:50 h06 drbd[28754]: [28789]: ERROR: r0: Exit code 11
> 
> A little bit later DRBD did it's own fencing (the machine rebooted)

I very much doubt that.  At least, from the above log excerpt,
I can not imagine a scenario for any of the below cited handlers to trigger,
unless you throw multiple failures in the mix.

But you apparently get a "demote failure", and possibly then a "stop
failure" as well, which may trigger a stonith event.

Or maybe IO is blocked for "too long" so OCFS2 decides to self-fence.

Guess you have to improve your logging.

> Is there a way to let the cluster do the fencing instead of writing to 
> sysctl? Those handlers are used:
>         handlers {
>                 pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; 
> reboot -f";
>                 pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; 
> reboot -f";
>                 local-io-error "/usr/lib/drbd/notify-io-error.sh; 
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; 
> halt -f";

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to