Re: [Pacemaker] Multiple split-brain problem

Andreas Kurz Tue, 26 Jun 2012 09:13:53 -0700

On 06/26/2012 03:49 PM, coma wrote:
> Hello,
> 
> i running on a 2 node cluster with corosync & drbd in active/passive
> mode for mysql hight availablity.
> 
> The cluster working fine (failover/failback & replication ok), i have no
> network outage (network is monitored and i've not seen any failure) but
> split-brain occurs very often and i don't anderstand why, maybe you can
> help me?


Are the nodes virtual machines or have a high load from time to time?

> 
> I'm new pacemaker/corosync/DRBD user, so my cluster and drbd
> configuration are probably not optimal, so if you have any comments,
> tips or examples I would be very grateful!
> 
> Here is an exemple of corosync log when a split-brain occurs (1 hour log
> to see before/after split-brain):
> 
> http://pastebin.com/3DprkcTA

Increase your token value in corosync.conf to a higher value ... like
10s, configure resource-level fencing in DRBD and setup STONITH for your
cluster and use redundant corosync rings.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thank you in advance for any help!
> 
> 
> More details about my configuration:
> 
> I have:
> One prefered "master" node (node1) on a virtual server, and one "slave"
> node on a physical server.
> On each server,
> eth0 is connected on my main LAN for client/server communication (with
> cluster VIP)
> Eth1 is connected on a dedicated Vlan for corosync communication
> (network: 192.168.3.0 /30)
> Eth2 is connected on a dedicated Vlan for drbd replication (network:
> 192.168.2.0/30 <http://192.168.2.0/30>)
> 
> Here is my drbd configuration:
> 
> 
> resource drbd-mysql {
> protocol C;
>     disk {
>         on-io-error detach;
>     }
>     handlers {
>         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>     }
>     net {
>         cram-hmac-alg sha1;
>         shared-secret "secret";
>         after-sb-0pri discard-younger-primary;
>         after-sb-1pri discard-secondary;
>         after-sb-2pri call-pri-lost-after-sb;
>     }
>     startup {
>         wfc-timeout  1;
>         degr-wfc-timeout 1;
>     }
>     on node1{
>         device /dev/drbd1;
>         address 192.168.2.1:7801 <http://192.168.2.1:7801>;
>         disk /dev/sdb;
>         meta-disk internal;
>     }
>     on node2 {
>     device /dev/drbd1;
>     address 192.168.2.2:7801 <http://192.168.2.2:7801>;
>     disk /dev/sdb;
>     meta-disk internal;
>     }
> }
> 
> 
> Here my cluster config:
> 
> node node1 \
>         attributes standby="off"
> node node2 \
>         attributes standby="off"
> primitive Cluster-VIP ocf:heartbeat:IPaddr2 \
>         params ip="10.1.0.130" broadcast="10.1.7.255" nic="eth0"
> cidr_netmask="21" iflabel="VIP1" \
>         op monitor interval="10s" timeout="20s" \
>         meta is-managed="true"
> primitive cluster_status_page ocf:heartbeat:ClusterMon \
>         params pidfile="/var/run/crm_mon.pid"
> htmlfile="/var/www/html/cluster_status.html" \
>         op monitor interval="4s" timeout="20s"
> primitive datavg ocf:heartbeat:LVM \
>         params volgrpname="datavg" exclusive="true" \
>         op start interval="0" timeout="30" \
>         op stop interval="0" timeout="30"
> primitive drbd_mysql ocf:linbit:drbd \
>         params drbd_resource="drbd-mysql" \
>         op monitor interval="15s"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>         params device="/dev/datavg/data" directory="/data" fstype="ext4"
> primitive mail_alert ocf:heartbeat:MailTo \
>         params email="myem...@test.com <mailto:myem...@test.com>" \
>         op monitor interval="10" timeout="10" depth="0"
> primitive mysqld ocf:heartbeat:mysql \
>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
> datadir="/data/mysql/databases" user="mysql"
> pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock"
> test_passwd="cluster_test" test_table="Cluster_Test.dbcheck"
> test_user="cluster_test" \
>         op start interval="0" timeout="120" \
>         op stop interval="0" timeout="120" \
>         op monitor interval="30s" timeout="30s" OCF_CHECK_LEVEL="1"
> target-role="Started"
> group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page
> mail_alert
> ms ms_drbd_mysql drbd_mysql \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location mysql-preferred-node mysql inf: node1
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1340701656"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100" \
>         migration-threshold="2" \
>         failure-timeout="30s"
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Multiple split-brain problem

Reply via email to