Re: [Pacemaker] MySQL, Percona replication manager - split brain

Andrew Sun, 26 Oct 2014 12:24:25 -0700

26.10.2014 17:44, Andrei Borzenkov пишет:

В Sun, 26 Oct 2014 10:51:13 +0200
Andrew <ni...@seti.kr.ua> пишет:

26.10.2014 08:32, Andrei Borzenkov пишет:

В Sat, 25 Oct 2014 23:34:54 +0300
Andrew <ni...@seti.kr.ua> пишет:

25.10.2014 22:34, Digimer пишет:

On 25/10/14 03:32 PM, Andrew wrote:

Hi all.

I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).

Your logs do not show resource restart - they show pacemaker restart on
node2.

Yes, you're right. This was a pacemaker restart.

After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).

In 'show replication status' there is an error in table caused by unique
index dup.

So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?

Cause:

Logs?

ct 25 13:54:13 node2 crmd[29248]:   notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster

That seems too late. The real cause is that resource was reported as
being in master state on both nodes and this happened earlier.

This is a different resources (pgsql and mysql)/

Prevent:

Fencing (aka stonith). This is why fencing is required.

No node failure. Just daemon was restarted.

"Split brain" == loss of communication. It does not matter whether
communication was lost because node failed or because daemon was not
running. There is no way for surviving node to know, *why*
communication was lost.

So how stonith will help in this case? Daemon will be restarted after
it's death if it occures during restart, and stonith will see alive
daemon...

So what is the easiest split-brain solution? Just to stop daemons, and
copy all mysql data from good node to bad one?

There is no split-brain visible in your log. Pacemaker on node2 was
restarted, cleanly as far as I can tell, and reintegrated back in
cluster. May be node1 "lost" node2, but that needs logs from node1.


Here is log from other node:

Oct 25 13:54:13 node1 pacemakerd[21773]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:13 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (56)Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_perform_update: Sentupdate 6993: master-p_mysql=56Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (53)Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_perform_update: Sentupdate 6995: master-p_mysql=53Oct 25 13:54:16 node1 pacemakerd[22035]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:18 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (60)Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_perform_update: Sentupdate 6997: master-p_mysql=60Oct 25 13:54:18 node1 pacemakerd[22335]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:19 node1 pacemakerd[22476]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:19 node1 mysql(p_mysql)[22446]: INFO: Ignoring post-demotenotification execpt for my own demotion.Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2423, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2425, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node1 pacemakerd[22597]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:20 node1 mysql(p_mysql)[22540]: INFO: Ignoring post-demotenotification execpt for my own demotion.Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2433, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Adding inetaddress 192.168.253.254/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Addinginet address 192.168.253.31/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2435, rc=0, cib-update=0,confirmed=true) ok

Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Bringing device br0 up

Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Bringingdevice br0 upOct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.254 br0 192.168.253.254auto not_used not_usedOct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 autonot_used not_usedOct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation ClusterIP_start_0 (call=2429, rc=0, cib-update=653,confirmed=true) okOct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_start_0 (call=2431, rc=0, cib-update=654,confirmed=true) okOct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation ClusterIP_monitor_2000 (call=2441, rc=0, cib-update=655,confirmed=false) okOct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_monitor_10000 (call=2443, rc=0,cib-update=656, confirmed=false) okOct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: post-demotenotification for node2.cluster.Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave hasfinished reading master binary logOct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave hasfinished processing relay logOct 25 13:54:20 node1 pacemakerd[22948]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:20 node1 pgsql(p_pgsql)[22937]: INFO: Changingp_pgsql-data-status on node2.cluster : STREAMING|SYNC->DISCONNECT.Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2445, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Setup node2.clusterinto async mode.

Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: server signaled

Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Reload configurationfile.Oct 25 13:54:21 node1 pacemakerd[23094]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2450, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2453, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:22 node1 ntpd[1526]: Listen normally on 102 br0192.168.253.254 UDP 123Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 103 br0192.168.253.31 UDP 123

Oct 25 13:54:22 node1 ntpd[1526]: peers refreshed

Oct 25 13:54:23 node1 pacemakerd[23336]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:25 node1 pacemakerd[23575]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:27 node1 pacemakerd[23851]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:29 node1 pacemakerd[24006]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:32 node1 pacemakerd[24205]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:34 node1 pacemakerd[24541]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:36 node1 pacemakerd[24705]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:38 node1 pacemakerd[24912]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:41 node1 pacemakerd[25150]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:43 node1 pacemakerd[25440]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:45 node1 pacemakerd[25641]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:47 node1 pacemakerd[25863]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:50 node1 pacemakerd[26016]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:52 node1 pacemakerd[26230]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:54 node1 pacemakerd[26563]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:56 node1 pacemakerd[26780]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:58 node1 pacemakerd[27008]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:01 node1 pacemakerd[27208]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:03 node1 pacemakerd[27540]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:05 node1 pacemakerd[27696]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:07 node1 pacemakerd[27958]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:10 node1 pacemakerd[28111]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:12 node1 pacemakerd[28310]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:14 node1 pacemakerd[28641]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:17 node1 pacemakerd[28812]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:19 node1 pacemakerd[29019]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:21 node1 pacemakerd[29219]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:23 node1 pacemakerd[29551]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:25 node1 pacemakerd[29706]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:28 node1 pacemakerd[29978]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:30 node1 pacemakerd[30177]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:32 node1 pacemakerd[30420]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:34 node1 pacemakerd[30708]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:37 node1 pacemakerd[30871]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2456, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2459, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:38 node1 mysql(p_mysql)[31063]: WARNING: Attempted to unsetthe replication master on an instance that is not configured as areplication slaveOct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (1060)Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_perform_update: Sentupdate 7016: master-p_mysql=1060Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_promote_0 (call=2462, rc=0, cib-update=657,confirmed=true) okOct 25 13:55:38 node1 mysql(p_mysql)[31139]: INFO: This will be the newmaster, ignoring post-promote notification.Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2465, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:38 node1 lrmd[26078]: notice: operation_finished:p_mysql_monitor_5000:31173:stderr [ resource ms_MySQL is NOT running ]Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_monitor_5000 (call=2468, rc=8, cib-update=658,confirmed=false) masterOct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Addinginet address 192.168.253.64/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Bringingdevice br0 upOct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.64 br0 192.168.253.64 autonot_used not_usedOct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation mysql_writer_vip_start_0 (call=2470, rc=0, cib-update=659,confirmed=true) okOct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRMoperation mysql_writer_vip_monitor_10000 (call=2474, rc=0,cib-update=660, confirmed=false) okOct 25 13:55:38 node1 crmd[26081]: notice: peer_update_callback: Ourpeer on the DC is deadOct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: Statetransition S_NOT_DC -> S_ELECTION [ input=I_ELECTIONcause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ]Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: Statetransition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DCcause=C_FSA_INTERNAL origin=do_election_check ]

Oct 25 13:55:38 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.322.7

Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.323.1b52f4b73ea1a2704d36b65af7364a3f8Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: -- <nvpairvalue="1.1.8-7.el6-394e906" id="cib-bootstrap-options-dc-version"/>Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: ++ <nvpairid="cib-bootstrap-options-dc-version" name="dc-version"value="1.1.10-14.el6_5.3-368c726"/>Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_local_callback:Sending full refresh (origin=crmd)Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (1060)Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_pgsql (1000)Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-master-baseline(000000000700FB08)Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (1)Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (PRI)Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)Oct 25 13:55:39 node1 pacemakerd[31397]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:40 node1 ntpd[1526]: Listen normally on 104 br0192.168.253.64 UDP 123

Oct 25 13:55:40 node1 ntpd[1526]: peers refreshed

Oct 25 13:55:41 node1 pacemakerd[31594]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:42 node1 attrd[26079]: notice: attrd_local_callback:Sending full refresh (origin=crmd)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (1060)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_pgsql (1000)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-master-baseline(000000000700FB08)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (1)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (PRI)Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:43 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Startp_mysql:1#011(node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Startp_pgsql:1#011(node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Startp_nginx:1#011(node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Startp_perl-fpm:1#011(node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Movep_web_ip#011(Started node1.cluster -> node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Movep_radius_ip#011(Started node1.cluster -> node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Startp_radiusd:1#011(node2.cluster)Oct 25 13:55:43 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-547.bz2Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 17: monitor ClusterIP_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 18: monitor mysql_reader_vip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 19: monitor mysql_writer_vip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 20: monitor pgsql_reader_vip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 21: monitor pgsql_writer_vip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 22: monitor p_mysql:1_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 23: monitor p_pgsql:1_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 24: monitor p_nginx:1_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 25: monitor p_perl-fpm:1_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 26: monitor p_web_ip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 27: monitor p_radius_ip_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 28: monitor p_radiusd:1_monitor_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 140: notify p_mysql_pre_notify_start_0 on node1.cluster (local)Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 151: notify p_pgsql_pre_notify_start_0 on node1.cluster (local)Oct 25 13:55:43 node1 pacemakerd[31858]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2477, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2479, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 16: probe_complete probe_complete on node2.cluster - no waitingOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 44: start p_mysql:1_start_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 74: start p_pgsql:1_start_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 108: start p_nginx:1_start_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 121: stop p_web_ip_stop_0 on node1.cluster (local)Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 124: stop p_radius_ip_stop_0 on node1.cluster (local)Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 129: start p_radiusd:1_start_0 on node2.clusterOct 25 13:55:43 node1 pacemakerd[32056]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:43 node1 IPaddr2(p_radius_ip)[32032]: INFO: IP status = ok,IP_CIP=Oct 25 13:55:43 node1 IPaddr2(p_web_ip)[32030]: INFO: IP status = ok,IP_CIP=Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_radius_ip_stop_0 (call=2487, rc=0, cib-update=699,confirmed=true) okOct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_web_ip_stop_0 (call=2484, rc=0, cib-update=700,confirmed=true) okOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 125: start p_radius_ip_start_0 on node2.clusterOct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 122: start p_web_ip_start_0 on node2.clusterOct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 110: monitor p_nginx:1_monitor_30000 on node2.clusterOct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 109: monitor p_nginx:1_monitor_10000 on node2.clusterOct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 111: start p_perl-fpm:1_start_0 on node2.clusterOct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 126: monitor p_radius_ip_monitor_10000 on node2.clusterOct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 123: monitor p_web_ip_monitor_10000 on node2.clusterOct 25 13:55:44 node1 lrmd[26078]: notice: operation_finished:p_mysql_monitor_5000:31992:stderr [ resource ms_MySQL is NOT running ]Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #100 lo,10.255.0.32#123, interface stats: received=0, sent=0, dropped=0,active_time=44998 secsOct 25 13:55:45 node1 ntpd[1526]: Deleting interface #18 lo,10.255.0.33#123, interface stats: received=0, sent=0, dropped=0,active_time=47428 secs

Oct 25 13:55:45 node1 ntpd[1526]: peers refreshed

Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 152: notify p_pgsql_post_notify_start_0 on node1.cluster (local)Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 153: notify p_pgsql:1_post_notify_start_0 on node2.clusterOct 25 13:55:45 node1 pacemakerd[32302]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changingp_pgsql-data-status on node2.cluster : DISCONNECT->STREAMING|ASYNC.

Oct 25 13:55:45 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.323.42

Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.324.15fa5332e25fae6e209da715c2ba76e73Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: -- <nvpairvalue="DISCONNECT" id="nodes-node2.cluster-p_pgsql-data-status"/>Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: ++ <nvpairid="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"value="STREAMING|ASYNC"/>Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Setup node2.clusterinto sync mode.

Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: server signaled

Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Reload configurationfile.Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changingp_pgsql-status on node2.cluster : HS:alone->HS:async.Oct 25 13:55:45 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2493, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:46 node1 pacemakerd[32448]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changingp_pgsql-data-status on node2.cluster : STREAMING|ASYNC->STREAMING|SYNC.

Oct 25 13:55:46 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.324.2

Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.325.178a796d8af0b652875a2fff886b4b5bdOct 25 13:55:46 node1 cib[26076]: notice: cib:diff: -- <nvpairvalue="STREAMING|ASYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: ++ <nvpairid="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"value="STREAMING|SYNC"/>Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsqlmaster score on node2.cluster : -INFINITY->100.Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changingp_pgsql-status on node2.cluster : HS:async->HS:sync.Oct 25 13:55:48 node1 pacemakerd[32701]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:49 node1 lrmd[26078]: notice: operation_finished:p_mysql_monitor_5000:444:stderr [ resource ms_MySQL is NOT running ]Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 141: notify p_mysql_post_notify_start_0 on node1.cluster (local)Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 142: notify p_mysql:1_post_notify_start_0 on node2.clusterOct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2496, rc=0, cib-update=0,confirmed=true) okOct 25 13:55:49 node1 crmd[26081]: notice: run_graph: Transition 0(Complete=50, Pending=0, Fired=0, Skipped=5, Incomplete=1,Source=/var/lib/pacemaker/pengine/pe-input-547.bz2): StoppedOct 25 13:55:49 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:49 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Movemysql_reader_vip#011(Started node1.cluster -> node2.cluster)Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Movepgsql_reader_vip#011(Started node1.cluster -> node2.cluster)Oct 25 13:55:49 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-548.bz2Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 22: stop mysql_reader_vip_stop_0 on node1.cluster (local)Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 27: stop pgsql_reader_vip_stop_0 on node1.cluster (local)Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 38: monitor p_mysql_monitor_2000 on node2.clusterOct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 69: monitor p_pgsql_monitor_7000 on node2.clusterOct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 106: monitor p_perl-fpm_monitor_10000 on node2.clusterOct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 123: monitor p_radiusd_monitor_10000 on node2.clusterOct 25 13:55:49 node1 IPaddr2(mysql_reader_vip)[512]: INFO: IP status =ok, IP_CIP=Oct 25 13:55:49 node1 IPaddr2(pgsql_reader_vip)[513]: INFO: IP status =ok, IP_CIP=Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRMoperation mysql_reader_vip_stop_0 (call=2500, rc=0, cib-update=702,confirmed=true) okOct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_stop_0 (call=2503, rc=0, cib-update=703,confirmed=true) okOct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 23: start mysql_reader_vip_start_0 on node2.clusterOct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 28: start pgsql_reader_vip_start_0 on node2.clusterOct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 24: monitor mysql_reader_vip_monitor_10000 on node2.clusterOct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 29: monitor pgsql_reader_vip_monitor_10000 on node2.clusterOct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 1(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-548.bz2): CompleteOct 25 13:55:50 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:50 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:50 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-549.bz2Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 2(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-549.bz2): CompleteOct 25 13:55:50 node1 crmd[26081]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:55:50 node1 pacemakerd[631]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:51 node1 ntpd[1526]: Deleting interface #103 br0,192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,active_time=89 secsOct 25 13:55:51 node1 ntpd[1526]: Deleting interface #101 br0,192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,active_time=42685 secs

Oct 25 13:55:51 node1 ntpd[1526]: peers refreshed

Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:52 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:52 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-550.bz2Oct 25 13:55:52 node1 crmd[26081]: notice: run_graph: Transition 3(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-550.bz2): CompleteOct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:55:53 node1 pacemakerd[846]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:54 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:54 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-551.bz2Oct 25 13:55:54 node1 crmd[26081]: notice: run_graph: Transition 4(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-551.bz2): CompleteOct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:55:55 node1 pacemakerd[1123]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:56 node1 crmd[26081]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Movemysql_reader_vip#011(Started node2.cluster -> node1.cluster)Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-552.bz2Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 26: stop mysql_reader_vip_stop_0 on node2.clusterOct 25 13:55:56 node1 crmd[26081]: notice: run_graph: Transition 5(Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): StoppedOct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Startmysql_reader_vip#011(node1.cluster)Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-553.bz2Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 25: start mysql_reader_vip_start_0 on node1.cluster (local)Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Adding inetaddress 192.168.253.63/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Bringingdevice br0 upOct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 autonot_used not_usedOct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRMoperation mysql_reader_vip_start_0 (call=2509, rc=0, cib-update=709,confirmed=true) okOct 25 13:55:57 node1 crmd[26081]: notice: te_rsc_command: Initiatingaction 26: monitor mysql_reader_vip_monitor_10000 on node1.cluster (local)Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRMoperation mysql_reader_vip_monitor_10000 (call=2512, rc=0,cib-update=710, confirmed=false) okOct 25 13:55:57 node1 crmd[26081]: notice: run_graph: Transition 6(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): CompleteOct 25 13:55:57 node1 crmd[26081]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:55:57 node1 pacemakerd[1528]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:58 node1 ntpd[1526]: Listen normally on 105 br0192.168.253.63 UDP 123

Oct 25 13:55:58 node1 ntpd[1526]: peers refreshed

You probably misuse "split brain" in this case. Split-brain means -
nodes lost communication with each other, so each node is unaware of
in which state resources on other node are. Here "nodes" means
corosync/pacemaker. Not individual resources.

I mean mysql database split brain - databases on nodes are out of sync.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] MySQL, Percona replication manager - split brain

Reply via email to