On Wed, Dec 21, 2011 at 11:18:23AM +0100, Ulrich Windl wrote:
> >>> Andreas Kurz <[email protected]> schrieb am 20.12.2011 um 22:57 in
> >>> Nachricht
> <[email protected]>:
> > Hello,
> >
> > On 12/20/2011 02:47 PM, Ulrich Windl wrote:
> > > Hi!
> > >
> > > I have a dual-primary DRBD that is not working well: It was working, then
> > > I
> > shut it down and restarted it. DRBD complained about split brain and fenced
> > the other node. When coming up, the other node fenced this node. IMHO no
> > node
> > should have fenced each other.
> > >
> >
> > no config from drbd, no cluster config, partial/filtered logs ...
> > fragments ... you have _all_ information and can't find the problem ...
> > sorry, but I can't see how anyone can help you based on that information.
>
> Well,
>
> to me the problem looks like this: When starting both DRBDs talk to each
> other successfully, then they say "we jsut talked about not being able to
> talk to each other, so let's commit suicide, because afterwards we can talk
> better to each other"
>
> I think the diagnosis for "split brain" is based on disk content, not on
> communication failure, because the nodes just talked to each other. So a
> sync, not suicide would be the proper solution for the conflict.
>
> And as far as the DRBD logs are concearned, they are complete in the interval
> that's interesting.
>
> I only heard from third party rumors that "this and that" isn't working, but
> nobody could actually tell me why. I was hoping to get some insight here.
>
> >
> > I personally think it is part of the free community support deal to
> > share as much information as possible if one wants help for free.
>
> Well, if anybody has a dual-primary DRBD (with OCFS on top) working with
> pacemaker, would you share your configuration with me to find out what's
> different?
>
> Here's my configuration:
> # grep -v '^[ ]*#' *
> global_common.conf:global {
> global_common.conf: usage-count no;
> global_common.conf:}
> global_common.conf:
> global_common.conf:common {
> global_common.conf: protocol C;
> global_common.conf:
> global_common.conf: handlers {
> global_common.conf: pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; sync; echo b > /proc/sysrq-trigger
> ; reboot -f";
> global_common.conf: pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; sync; echo b > /proc/sysrq-trigger
> ; reboot -f";
> global_common.conf: local-io-error
> "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; sync; echo o >
> /proc/sysrq-trigger ; halt -f";
> global_common.conf: split-brain
> "/usr/lib/drbd/notify-split-brain.sh root";
> global_common.conf: out-of-sync
> "/usr/lib/drbd/notify-out-of-sync.sh root";
> global_common.conf: }
> global_common.conf:
> global_common.conf: startup {
> global_common.conf: become-primary-on both;
> global_common.conf: wfc-timeout 15;
> global_common.conf: }
> global_common.conf:
> global_common.conf: disk {
> global_common.conf: use-bmbv;
> global_common.conf: }
So you do not even have DRBD fencing configured,
yet claim that DRBD fencing was shooting your nodes.
Yeah, right.
> global_common.conf:
> global_common.conf: net {
> global_common.conf: allow-two-primaries;
> global_common.conf: after-sb-0pri discard-zero-changes;
> global_common.conf: after-sb-1pri discard-secondary;
> global_common.conf: after-sb-2pri disconnect;
> global_common.conf: }
> global_common.conf:
> global_common.conf: syncer {
> global_common.conf: }
> global_common.conf:}
> r0.res:resource r0 {
> r0.res: device /dev/drbd_r0 minor 0;
> r0.res: disk /dev/sys/samba;
> r0.res: meta-disk internal;
> r0.res: on h02 {
> r0.res: address 172.20.78.2:7780;
> r0.res: }
> r0.res: on h06 {
> r0.res: address 172.20.78.6:7780;
> r0.res: }
> r0.res: syncer {
> r0.res: rate 7M;
> r0.res: }
> r0.res:}
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems