On Wed, Jun 27, 2012 at 8:04 PM, coma <coma....@gmail.com> wrote: > Thank for your reply Andreas, > > My fisrt node is a virtual machine (active node), the second (passive node) > is physical standalone server, there is no high load on any of them but the > problem seems to come from the virtual server. > I actually have the same problem of split brain when I take or delete a > virtual machine snapshot (network connection is lost for a few moment, maybe > about 1s). But i take snapshot only once a week, and I have split brain > several times in a week. > I didn't detect any other loss of connection, or perhaps it is micro network > cuts that are not detected by my monitoring system (and I have no problem > with my nonclustered services). > In case of microcuts, i think the problem is DRBD, is it too sensitive? can > i adjust values to avoid the problem? > > > I will try increase my token value to 10000 / consensus to 12000 and > configure resource-level fencing in DRBD, thanks for the tips. > > About redundant rings, I read on the DRBD documentation that it is vital for > the resource level fencing, but can i do without? > Because i use a virtual server (my virtual servers are on a blade) i can't > have "physical" link between the 2 nodes (cable between the 2 nodes), so i > use "virtual links" (with vlan to separate them from my main network). I can > create a 2nd corosync link but I doubt its usefulness, if something goes > wrong with the first link, I think i would have the same problem on the > second. Although they are virtually separated, they use the same physical > hardware (All my hardware is redondant therefore link problems are very > limited). > But maybe i've wrong, I'll think about it. > > > About stonith, I will read the documentation, but is it really useful to get > out the "big artillery" for a simple 2-node cluster in active / passive > mode? (I read that stonith is most used for active / active clusters).
Without stonith, your active/passive cluster can very easily start acting like an active/active one. So it depends how much you value your data. > > Anyway, thank you for these advices, this is much appreciated! > > > > 2012/6/26 Andreas Kurz <andr...@hastexo.com> >> >> On 06/26/2012 03:49 PM, coma wrote: >> > Hello, >> > >> > i running on a 2 node cluster with corosync & drbd in active/passive >> > mode for mysql hight availablity. >> > >> > The cluster working fine (failover/failback & replication ok), i have no >> > network outage (network is monitored and i've not seen any failure) but >> > split-brain occurs very often and i don't anderstand why, maybe you can >> > help me? >> >> Are the nodes virtual machines or have a high load from time to time? >> >> > >> > I'm new pacemaker/corosync/DRBD user, so my cluster and drbd >> > configuration are probably not optimal, so if you have any comments, >> > tips or examples I would be very grateful! >> > >> > Here is an exemple of corosync log when a split-brain occurs (1 hour log >> > to see before/after split-brain): >> > >> > http://pastebin.com/3DprkcTA >> >> Increase your token value in corosync.conf to a higher value ... like >> 10s, configure resource-level fencing in DRBD and setup STONITH for your >> cluster and use redundant corosync rings. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >> > >> > Thank you in advance for any help! >> > >> > >> > More details about my configuration: >> > >> > I have: >> > One prefered "master" node (node1) on a virtual server, and one "slave" >> > node on a physical server. >> > On each server, >> > eth0 is connected on my main LAN for client/server communication (with >> > cluster VIP) >> > Eth1 is connected on a dedicated Vlan for corosync communication >> > (network: 192.168.3.0 /30) >> > Eth2 is connected on a dedicated Vlan for drbd replication (network: >> > 192.168.2.0/30 <http://192.168.2.0/30>) >> > >> > Here is my drbd configuration: >> > >> > >> > resource drbd-mysql { >> > protocol C; >> > disk { >> > on-io-error detach; >> > } >> > handlers { >> > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >> > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; >> > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; >> > } >> > net { >> > cram-hmac-alg sha1; >> > shared-secret "secret"; >> > after-sb-0pri discard-younger-primary; >> > after-sb-1pri discard-secondary; >> > after-sb-2pri call-pri-lost-after-sb; >> > } >> > startup { >> > wfc-timeout 1; >> > degr-wfc-timeout 1; >> > } >> > on node1{ >> > device /dev/drbd1; >> > address 192.168.2.1:7801 <http://192.168.2.1:7801>; >> > disk /dev/sdb; >> > meta-disk internal; >> > } >> > on node2 { >> > device /dev/drbd1; >> > address 192.168.2.2:7801 <http://192.168.2.2:7801>; >> > disk /dev/sdb; >> > meta-disk internal; >> > } >> > } >> > >> > >> > Here my cluster config: >> > >> > node node1 \ >> > attributes standby="off" >> > node node2 \ >> > attributes standby="off" >> > primitive Cluster-VIP ocf:heartbeat:IPaddr2 \ >> > params ip="10.1.0.130" broadcast="10.1.7.255" nic="eth0" >> > cidr_netmask="21" iflabel="VIP1" \ >> > op monitor interval="10s" timeout="20s" \ >> > meta is-managed="true" >> > primitive cluster_status_page ocf:heartbeat:ClusterMon \ >> > params pidfile="/var/run/crm_mon.pid" >> > htmlfile="/var/www/html/cluster_status.html" \ >> > op monitor interval="4s" timeout="20s" >> > primitive datavg ocf:heartbeat:LVM \ >> > params volgrpname="datavg" exclusive="true" \ >> > op start interval="0" timeout="30" \ >> > op stop interval="0" timeout="30" >> > primitive drbd_mysql ocf:linbit:drbd \ >> > params drbd_resource="drbd-mysql" \ >> > op monitor interval="15s" >> > primitive fs_mysql ocf:heartbeat:Filesystem \ >> > params device="/dev/datavg/data" directory="/data" fstype="ext4" >> > primitive mail_alert ocf:heartbeat:MailTo \ >> > params email="myem...@test.com <mailto:myem...@test.com>" \ >> > op monitor interval="10" timeout="10" depth="0" >> > primitive mysqld ocf:heartbeat:mysql \ >> > params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" >> > datadir="/data/mysql/databases" user="mysql" >> > pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" >> > test_passwd="cluster_test" test_table="Cluster_Test.dbcheck" >> > test_user="cluster_test" \ >> > op start interval="0" timeout="120" \ >> > op stop interval="0" timeout="120" \ >> > op monitor interval="30s" timeout="30s" OCF_CHECK_LEVEL="1" >> > target-role="Started" >> > group mysql datavg fs_mysql Cluster-VIP mysqld cluster_status_page >> > mail_alert >> > ms ms_drbd_mysql drbd_mysql \ >> > meta master-max="1" master-node-max="1" clone-max="2" >> > clone-node-max="1" notify="true" >> > location mysql-preferred-node mysql inf: node1 >> > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master >> > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start >> > property $id="cib-bootstrap-options" \ >> > >> > dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \ >> > cluster-infrastructure="openais" \ >> > expected-quorum-votes="2" \ >> > stonith-enabled="false" \ >> > no-quorum-policy="ignore" \ >> > last-lrm-refresh="1340701656" >> > rsc_defaults $id="rsc-options" \ >> > resource-stickiness="100" \ >> > migration-threshold="2" \ >> > failure-timeout="30s" >> > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org