В Sat, 25 Oct 2014 23:34:54 +0300 Andrew <ni...@seti.kr.ua> пишет:
> 25.10.2014 22:34, Digimer пишет: > > On 25/10/14 03:32 PM, Andrew wrote: > >> Hi all. > >> > >> I use Percona as RA on cluster (nothing mission-critical, currently - > >> just zabbix data); today after restarting MySQL resource (crm resource > >> restart p_mysql) I've got a split brain state - MySQL for some reason > >> started first at ex-slave node, ex-master starts later (possibly I've > >> set too small timeout to shutdown - only 120s, but I'm not sure). > >> > >> After restart resource on both nodes it seems like mysql replication was > >> ok - but then after ~50min it fails in split brain again for unknown > >> reason (no resource restart was noticed). > >> > >> In 'show replication status' there is an error in table caused by unique > >> index dup. > >> > >> So I have a questions: > >> 1) Which thing causes split brain, and how to avoid it in future? > > > > Cause: > > > > Logs? > ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss > of CCM Quorum: Ignore > Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation > monitor found resource p_pgsql:0 active in master mode on node1.cluster > Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation > monitor found resource p_mysql:1 active in master mode on node2.cluster That seems too late. The real cause is that resource was reported as being in master state on both nodes and this happened earlier. > > > > > Prevent: > > > > Fencing (aka stonith). This is why fencing is required. > No node failure. Just daemon was restarted. > "Split brain" == loss of communication. It does not matter whether communication was lost because node failed or because daemon was not running. There is no way for surviving node to know, *why* communication was lost. > > > >> 2) How to resolve split brain state? Is it enough just to wait for > >> failure, then - restart mysql by hand and clean row with dup index in > >> slave db, and then run resource again? Or there is some automation for > >> such cases? > > > > How are you sharing data? Can you give us a better understanding of > > your setup? > > > Semi-synchronous MySQL replication, if you mean sharing DB log between > nodes. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org