Hi! OK, I manually added a "sync" before those reboots. Maybe I'll find some trace after the next unexplainable reboot.
Regards, Ulrich >>> Lars Ellenberg <[email protected]> schrieb am 19.12.2011 um 21:40 in Nachricht <[email protected]>: > On Fri, Dec 16, 2011 at 01:31:32PM +0100, Ulrich Windl wrote: > > Hi! > > > > I have some troubel with OCFS on top of DRBD that seems to be > timing-related: > > OCFS is working on the DRBD when DRBD itself wants to vhange something it > seems: > > > > ... > > Dec 16 11:39:58 h06 kernel: [ 122.426174] block drbd0: role( Secondary -> > Primary ) > > Dec 16 11:39:58 h06 multipathd: drbd0: update path write_protect to '0' > (uevent) > > Dec 16 11:40:29 h06 ocfs2_controld: start_mount: uuid > "FD32E504527742CEA7DA6DB272D5D7B2", device "/dev/drbd_r0", service "ocfs2" > > ... > > Dec 16 11:40:29 h06 kernel: [ 152.837615] block drbd0: peer( Secondary -> > Primary ) > > Dec 16 11:40:29 h06 ocfs2_hb_ctl[19177]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -P > -d /dev/drbd_r0 > > Dec 16 11:43:50 h06 kernel: [ 354.559240] block drbd0: State change > failed: Device is held open by someone > > Dec 16 11:43:50 h06 kernel: [ 354.559244] block drbd0: state = { > cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate r----- } > > Dec 16 11:43:50 h06 kernel: [ 354.559246] block drbd0: wanted = { > cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate r----- } > > Dec 16 11:43:50 h06 drbd[28754]: [28786]: ERROR: r0: Called drbdadm -c > /etc/drbd.conf secondary r0 > > The resource agent was told to demote. > That fails, as DRBD is still/already in use (by ocfs2 or other). > > > Dec 16 11:43:50 h06 drbd[28754]: [28789]: ERROR: r0: Exit code 11 > > > > A little bit later DRBD did it's own fencing (the machine rebooted) > > I very much doubt that. At least, from the above log excerpt, > I can not imagine a scenario for any of the below cited handlers to trigger, > unless you throw multiple failures in the mix. > > But you apparently get a "demote failure", and possibly then a "stop > failure" as well, which may trigger a stonith event. > > Or maybe IO is blocked for "too long" so OCFS2 decides to self-fence. > > Guess you have to improve your logging. > > > Is there a way to let the cluster do the fencing instead of writing to > sysctl? Those handlers are used: > > handlers { > > pri-on-incon-degr > > "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot > -f"; > > pri-lost-after-sb > > "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot > -f"; > > local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; > halt > -f"; _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
