Re: [Linux-HA] Antw: Re: OCFS on top of dual-primary DRBD in SLES11 SP1

Ulrich Windl Mon, 19 Dec 2011 05:11:31 -0800

>>> Andreas Kurz <[email protected]> schrieb am 19.12.2011 um 12:02 in 
>>> Nachricht
<[email protected]>:
> On 12/19/2011 09:15 AM, Ulrich Windl wrote:
>>>>> Andreas Kurz <[email protected]> schrieb am 16.12.2011 um 14:01 in 
>>>>> Nachricht
> > <[email protected]>:
> >> Hello Ulrich,
> >>
> >> On 12/16/2011 01:31 PM, Ulrich Windl wrote:
> >>> Hi!
> >>>
> >>> I have some troubel with OCFS on top of DRBD that seems to be 
> >> timing-related:
> >>> OCFS is working on the DRBD when DRBD itself wants to vhange something it 
> >> seems:
> >>
> >> can we see your cib and your full drbd cofniguration please ...
> > 
> > It's somewhat complex, and I may not show you everything, sorry for that.
> 
> no problem ... you asked for help on a public mailing-list ...
> 
> > 
> >>
> >>>
> >>> ...
> >>> Dec 16 11:39:58 h06 kernel: [  122.426174] block drbd0: role( Secondary 
> >>> -> 
> >> Primary )
> >>> Dec 16 11:39:58 h06 multipathd: drbd0: update path write_protect to '0' 
> >> (uevent)
> >>> Dec 16 11:40:29 h06 ocfs2_controld: start_mount: uuid 
> >> "FD32E504527742CEA7DA6DB272D5D7B2", device "/dev/drbd_r0", service "ocfs2"
> >>> ...
> >>> Dec 16 11:40:29 h06 kernel: [  152.837615] block drbd0: peer( Secondary 
> >>> -> 
> >> Primary )
> >>> Dec 16 11:40:29 h06 ocfs2_hb_ctl[19177]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl 
> >>> -P 
> >> -d /dev/drbd_r0
> >>> Dec 16 11:43:50 h06 kernel: [  354.559240] block drbd0: State change 
> >> failed: Device is held open by someone
> >>> Dec 16 11:43:50 h06 kernel: [  354.559244] block drbd0:   state = { 
> >> cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate r----- }
> >>> Dec 16 11:43:50 h06 kernel: [  354.559246] block drbd0:  wanted = { 
> >> cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate r----- }
> >>> Dec 16 11:43:50 h06 drbd[28754]: [28786]: ERROR: r0: Called drbdadm -c 
> >> /etc/drbd.conf secondary r0
> >>> Dec 16 11:43:50 h06 drbd[28754]: [28789]: ERROR: r0: Exit code 11
> >>>
> >>> A little bit later DRBD did it's own fencing (the machine rebooted)
> >>
> >> do you have logs to confirm this?
> > 
> > Naturally no, as the commands "echo b > /proc/sysrq-trigger ; reboot -f" 
> > don't 
> actually write nice log messages.
> 
> All those nice drbd notify scripts do send mails, at least to local root
> account. Additionally they try to log via syslog as well as DRBD does on
> executing the handler ... so you have a good chance to get some
> information if DRBD triggers that reboot ... at least if you are doing
> remote syslogging.


I examined "notify-io-error.sh": It's trying to log a syslog message and send 
mail. However as writing to disk and sending mails is both asynchronous, there 
are little chances that anything will make it to disk before "echo b > 
/proc/sysrq-trigger" becomes active. Unless it does more damage, I'd strongly 
recommend doing a "sync" before that.

Is it nice to have a quick reboot, or is it absolutely necessary?

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: OCFS on top of dual-primary DRBD in SLES11 SP1

Reply via email to