On Wed, Jun 15, 2011 at 4:20 PM, Dejan Muhamedagic <deja...@fastmail.fm>wrote:
> On Wed, Jun 15, 2011 at 03:26:56PM -0500, mark - pacemaker list wrote: > > On Wed, Jun 15, 2011 at 12:24 PM, imnotpc <imno...@rock3d.net> wrote: > > > > > > > > What I was thinking is that the DC is never fenced > > > > > > Is this actually the case? > > In a way it is true. Only DC can order fencing and there is > always exactly one DC in a partition. On split brain, each > partition elects a DC and if the DC has quorum it can try to > fence nodes in other partitions. That's why in two-node clusters there's always a shoot-out. But note that the old DC (before > split brain), if it loses quorum, gets fenced by a new DC from > another partition. > > > It would sure explain the one "gotcha" I've > > never been able to work around in a three node cluster with stonith/SBD. > If > > you unplug the network cable from the DC (but it and the other nodes all > > still see the SBD disk via their other NIC(s)), the DC of course becomes > > completely isolated. It will fence > > Fence? It won't fence anything unless it has quorum. Do you have > no-quorum-policy=ignore? > I have no-quorum-policy=freeze. With this status: ============ Last updated: Wed Jun 15 16:48:57 2011 Stack: Heartbeat Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 3 Nodes configured, unknown expected votes 5 Resources configured. ============ Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ] Resource Group: MySQL-history iscsi_mysql_history (ocf::heartbeat:iscsi): Started cn1.testlab.local volgrp_mysql_history (ocf::heartbeat:LVM): Started cn1.testlab.local fs_mysql_history (ocf::heartbeat:Filesystem): Started cn1.testlab.local ip_mysql_history (ocf::heartbeat:IPaddr2): Started cn1.testlab.local mysql_history (ocf::heartbeat:mysql): Started cn1.testlab.local mail_alert_history (ocf::heartbeat:MailTo): Started cn1.testlab.local Resource Group: MySQL-hsa iscsi_mysql_hsa (ocf::heartbeat:iscsi): Started cn2.testlab.local volgrp_mysql_hsa (ocf::heartbeat:LVM): Started cn2.testlab.local fs_mysql_hsa (ocf::heartbeat:Filesystem): Started cn2.testlab.local ip_mysql_hsa (ocf::heartbeat:IPaddr2): Started cn2.testlab.local mysql_hsa (ocf::heartbeat:mysql): Started cn2.testlab.local mail_alert_hsa (ocf::heartbeat:MailTo): Started cn2.testlab.local Resource Group: MySQL-livedata iscsi_mysql_livedata (ocf::heartbeat:iscsi): Started cn3.testlab.local volgrp_mysql_livedata (ocf::heartbeat:LVM): Started cn3.testlab.local fs_mysql_livedata (ocf::heartbeat:Filesystem): Started cn3.testlab.local ip_mysql_livedata (ocf::heartbeat:IPaddr2): Started cn3.testlab.local mysql_livedata (ocf::heartbeat:mysql): Started cn3.testlab.local mail_alert_livedata (ocf::heartbeat:MailTo): Started cn3.testlab.local stonith_sbd (stonith:external/sbd): Started cn2.testlab.local Resource Group: Cluster_Status cluster_status_ip (ocf::heartbeat:IPaddr2): Started cn3.testlab.local cluster_status_page (ocf::heartbeat:apache): Started cn3.testlab.local I isolated cn1 (the DC, but stonith_sbd was running on cn2). In this case, one of the two good nodes became DC and cn1 was fenced, so things worked as I'd expect. The outage for cn1's resources is quite short. However, with *this* status, where everything is the same as above except the stonith_sbd resource is also located on cn1, so it is both DC and the node running stonith_sbd: ============ Last updated: Wed Jun 15 16:58:49 2011 Stack: Heartbeat Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 3 Nodes configured, unknown expected votes 5 Resources configured. ============ Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ] Resource Group: MySQL-history iscsi_mysql_history (ocf::heartbeat:iscsi): Started cn1.testlab.local volgrp_mysql_history (ocf::heartbeat:LVM): Started cn1.testlab.local fs_mysql_history (ocf::heartbeat:Filesystem): Started cn1.testlab.local ip_mysql_history (ocf::heartbeat:IPaddr2): Started cn1.testlab.local mysql_history (ocf::heartbeat:mysql): Started cn1.testlab.local mail_alert_history (ocf::heartbeat:MailTo): Started cn1.testlab.local Resource Group: MySQL-hsa iscsi_mysql_hsa (ocf::heartbeat:iscsi): Started cn2.testlab.local volgrp_mysql_hsa (ocf::heartbeat:LVM): Started cn2.testlab.local fs_mysql_hsa (ocf::heartbeat:Filesystem): Started cn2.testlab.local ip_mysql_hsa (ocf::heartbeat:IPaddr2): Started cn2.testlab.local mysql_hsa (ocf::heartbeat:mysql): Started cn2.testlab.local mail_alert_hsa (ocf::heartbeat:MailTo): Started cn2.testlab.local Resource Group: MySQL-livedata iscsi_mysql_livedata (ocf::heartbeat:iscsi): Started cn3.testlab.local volgrp_mysql_livedata (ocf::heartbeat:LVM): Started cn3.testlab.local fs_mysql_livedata (ocf::heartbeat:Filesystem): Started cn3.testlab.local ip_mysql_livedata (ocf::heartbeat:IPaddr2): Started cn3.testlab.local mysql_livedata (ocf::heartbeat:mysql): Started cn3.testlab.local mail_alert_livedata (ocf::heartbeat:MailTo): Started cn3.testlab.local stonith_sbd (stonith:external/sbd): Started cn1.testlab.local Resource Group: Cluster_Status cluster_status_ip (ocf::heartbeat:IPaddr2): Started cn2.testlab.local cluster_status_page (ocf::heartbeat:apache): Started cn2.testlab.local ... when I isolated cn1, it almost immediately fenced cn3. Approx 30 seconds later cn2 promotes itself to DC as it's the only surviving node with network connectivity, but of course cn3 is just trying to come back up after a reboot so it isn't participating yet. I have two nodes that think they're DC, neither with quorum. That's where I decided to change no-quorum-policy to freeze, because at this time all services would shut down completely. With freeze, at least the services on the surviving good node stay up. Once cn3 finishes booting pacemaker starts, then cn2 and cn3 form a quorum and cn1 finally gets fenced, and all resources are able to start on machines with network connectivity. The outage in this case has of course been quite a bit longer than the previous one. Regards, Mark
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker