26.10.2014 17:44, Andrei Borzenkov пишет:
В Sun, 26 Oct 2014 10:51:13 +0200
Andrew <ni...@seti.kr.ua> пишет:

26.10.2014 08:32, Andrei Borzenkov пишет:
В Sat, 25 Oct 2014 23:34:54 +0300
Andrew <ni...@seti.kr.ua> пишет:

25.10.2014 22:34, Digimer пишет:
On 25/10/14 03:32 PM, Andrew wrote:
Hi all.

I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).

Your logs do not show resource restart - they show pacemaker restart on
node2.
Yes, you're right. This was a pacemaker restart.

After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).

In 'show replication status' there is an error in table caused by unique
index dup.

So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?
Cause:

Logs?
ct 25 13:54:13 node2 crmd[29248]:   notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
That seems too late. The real cause is that resource was reported as
being in master state on both nodes and this happened earlier.
This is a different resources (pgsql and mysql)/

Prevent:

Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.

"Split brain" == loss of communication. It does not matter whether
communication was lost because node failed or because daemon was not
running. There is no way for surviving node to know, *why*
communication was lost.

So how stonith will help in this case? Daemon will be restarted after
it's death if it occures during restart, and stonith will see alive
daemon...

So what is the easiest split-brain solution? Just to stop daemons, and
copy all mysql data from good node to bad one?
There is no split-brain visible in your log. Pacemaker on node2 was
restarted, cleanly as far as I can tell, and reintegrated back in
cluster. May be node1 "lost" node2, but that needs logs from node1.

Here is log from other node:

Oct 25 13:54:13 node1 pacemakerd[21773]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (56) Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_perform_update: Sent update 6993: master-p_mysql=56 Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (53) Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_perform_update: Sent update 6995: master-p_mysql=53 Oct 25 13:54:16 node1 pacemakerd[22035]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (60) Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_perform_update: Sent update 6997: master-p_mysql=60 Oct 25 13:54:18 node1 pacemakerd[22335]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:19 node1 pacemakerd[22476]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:19 node1 mysql(p_mysql)[22446]: INFO: Ignoring post-demote notification execpt for my own demotion. Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2423, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2425, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node1 pacemakerd[22597]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:20 node1 mysql(p_mysql)[22540]: INFO: Ignoring post-demote notification execpt for my own demotion. Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2433, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Adding inet address 192.168.253.254/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Adding inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2435, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Bringing device br0 up
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Bringing device br0 up Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.254 br0 192.168.253.254 auto not_used not_used Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto not_used not_used Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation ClusterIP_start_0 (call=2429, rc=0, cib-update=653, confirmed=true) ok Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation pgsql_reader_vip_start_0 (call=2431, rc=0, cib-update=654, confirmed=true) ok Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation ClusterIP_monitor_2000 (call=2441, rc=0, cib-update=655, confirmed=false) ok Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM operation pgsql_reader_vip_monitor_10000 (call=2443, rc=0, cib-update=656, confirmed=false) ok Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: post-demote notification for node2.cluster. Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has finished reading master binary log Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has finished processing relay log Oct 25 13:54:20 node1 pacemakerd[22948]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:20 node1 pgsql(p_pgsql)[22937]: INFO: Changing p_pgsql-data-status on node2.cluster : STREAMING|SYNC->DISCONNECT. Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2445, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Setup node2.cluster into async mode.
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: server signaled
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Reload configuration file. Oct 25 13:54:21 node1 pacemakerd[23094]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2450, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2453, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 102 br0 192.168.253.254 UDP 123 Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 103 br0 192.168.253.31 UDP 123
Oct 25 13:54:22 node1 ntpd[1526]: peers refreshed
Oct 25 13:54:23 node1 pacemakerd[23336]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:25 node1 pacemakerd[23575]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:27 node1 pacemakerd[23851]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:29 node1 pacemakerd[24006]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:32 node1 pacemakerd[24205]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:34 node1 pacemakerd[24541]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:36 node1 pacemakerd[24705]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:38 node1 pacemakerd[24912]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:41 node1 pacemakerd[25150]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:43 node1 pacemakerd[25440]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:45 node1 pacemakerd[25641]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:47 node1 pacemakerd[25863]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:50 node1 pacemakerd[26016]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:52 node1 pacemakerd[26230]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:54 node1 pacemakerd[26563]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:56 node1 pacemakerd[26780]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:58 node1 pacemakerd[27008]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:01 node1 pacemakerd[27208]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:03 node1 pacemakerd[27540]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:05 node1 pacemakerd[27696]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:07 node1 pacemakerd[27958]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:10 node1 pacemakerd[28111]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:12 node1 pacemakerd[28310]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:14 node1 pacemakerd[28641]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:17 node1 pacemakerd[28812]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:19 node1 pacemakerd[29019]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:21 node1 pacemakerd[29219]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:23 node1 pacemakerd[29551]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:25 node1 pacemakerd[29706]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:28 node1 pacemakerd[29978]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:30 node1 pacemakerd[30177]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:32 node1 pacemakerd[30420]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:34 node1 pacemakerd[30708]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:37 node1 pacemakerd[30871]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2456, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2459, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:38 node1 mysql(p_mysql)[31063]: WARNING: Attempted to unset the replication master on an instance that is not configured as a replication slave Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (1060) Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_perform_update: Sent update 7016: master-p_mysql=1060 Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_promote_0 (call=2462, rc=0, cib-update=657, confirmed=true) ok Oct 25 13:55:38 node1 mysql(p_mysql)[31139]: INFO: This will be the new master, ignoring post-promote notification. Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2465, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:38 node1 lrmd[26078]: notice: operation_finished: p_mysql_monitor_5000:31173:stderr [ resource ms_MySQL is NOT running ] Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_monitor_5000 (call=2468, rc=8, cib-update=658, confirmed=false) master Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Adding inet address 192.168.253.64/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Bringing device br0 up Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.64 br0 192.168.253.64 auto not_used not_used Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation mysql_writer_vip_start_0 (call=2470, rc=0, cib-update=659, confirmed=true) ok Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM operation mysql_writer_vip_monitor_10000 (call=2474, rc=0, cib-update=660, confirmed=false) ok Oct 25 13:55:38 node1 crmd[26081]: notice: peer_update_callback: Our peer on the DC is dead Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ] Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
Oct 25 13:55:38 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.322.7
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.323.1 b52f4b73ea1a2704d36b65af7364a3f8 Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: -- <nvpair value="1.1.8-7.el6-394e906" id="cib-bootstrap-options-dc-version"/> Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6_5.3-368c726"/> Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (1060) Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_pgsql (1000) Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-master-baseline (000000000700FB08) Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (1) Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (PRI) Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Oct 25 13:55:39 node1 pacemakerd[31397]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:40 node1 ntpd[1526]: Listen normally on 104 br0 192.168.253.64 UDP 123
Oct 25 13:55:40 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:41 node1 pacemakerd[31594]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (1060) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_pgsql (1000) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-master-baseline (000000000700FB08) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (1) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (PRI) Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start p_mysql:1#011(node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start p_pgsql:1#011(node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start p_nginx:1#011(node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start p_perl-fpm:1#011(node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move p_web_ip#011(Started node1.cluster -> node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move p_radius_ip#011(Started node1.cluster -> node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start p_radiusd:1#011(node2.cluster) Oct 25 13:55:43 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-547.bz2 Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 17: monitor ClusterIP_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 18: monitor mysql_reader_vip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 19: monitor mysql_writer_vip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 20: monitor pgsql_reader_vip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 21: monitor pgsql_writer_vip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 22: monitor p_mysql:1_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 23: monitor p_pgsql:1_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 24: monitor p_nginx:1_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 25: monitor p_perl-fpm:1_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 26: monitor p_web_ip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 27: monitor p_radius_ip_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 28: monitor p_radiusd:1_monitor_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 140: notify p_mysql_pre_notify_start_0 on node1.cluster (local) Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 151: notify p_pgsql_pre_notify_start_0 on node1.cluster (local) Oct 25 13:55:43 node1 pacemakerd[31858]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2477, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2479, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 16: probe_complete probe_complete on node2.cluster - no waiting Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 44: start p_mysql:1_start_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 74: start p_pgsql:1_start_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 108: start p_nginx:1_start_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 121: stop p_web_ip_stop_0 on node1.cluster (local) Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 124: stop p_radius_ip_stop_0 on node1.cluster (local) Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 129: start p_radiusd:1_start_0 on node2.cluster Oct 25 13:55:43 node1 pacemakerd[32056]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:43 node1 IPaddr2(p_radius_ip)[32032]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:43 node1 IPaddr2(p_web_ip)[32030]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_radius_ip_stop_0 (call=2487, rc=0, cib-update=699, confirmed=true) ok Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_web_ip_stop_0 (call=2484, rc=0, cib-update=700, confirmed=true) ok Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 125: start p_radius_ip_start_0 on node2.cluster Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating action 122: start p_web_ip_start_0 on node2.cluster Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating action 110: monitor p_nginx:1_monitor_30000 on node2.cluster Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating action 109: monitor p_nginx:1_monitor_10000 on node2.cluster Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating action 111: start p_perl-fpm:1_start_0 on node2.cluster Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating action 126: monitor p_radius_ip_monitor_10000 on node2.cluster Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating action 123: monitor p_web_ip_monitor_10000 on node2.cluster Oct 25 13:55:44 node1 lrmd[26078]: notice: operation_finished: p_mysql_monitor_5000:31992:stderr [ resource ms_MySQL is NOT running ] Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #100 lo, 10.255.0.32#123, interface stats: received=0, sent=0, dropped=0, active_time=44998 secs Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #18 lo, 10.255.0.33#123, interface stats: received=0, sent=0, dropped=0, active_time=47428 secs
Oct 25 13:55:45 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating action 152: notify p_pgsql_post_notify_start_0 on node1.cluster (local) Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating action 153: notify p_pgsql:1_post_notify_start_0 on node2.cluster Oct 25 13:55:45 node1 pacemakerd[32302]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing p_pgsql-data-status on node2.cluster : DISCONNECT->STREAMING|ASYNC.
Oct 25 13:55:45 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.323.42
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.324.1 5fa5332e25fae6e209da715c2ba76e73 Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: -- <nvpair value="DISCONNECT" id="nodes-node2.cluster-p_pgsql-data-status"/> Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: ++ <nvpair id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status" value="STREAMING|ASYNC"/> Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Setup node2.cluster into sync mode.
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: server signaled
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Reload configuration file. Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing p_pgsql-status on node2.cluster : HS:alone->HS:async. Oct 25 13:55:45 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2493, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:46 node1 pacemakerd[32448]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsql-data-status on node2.cluster : STREAMING|ASYNC->STREAMING|SYNC.
Oct 25 13:55:46 node1 cib[26076]:   notice: cib:diff: Diff: --- 0.324.2
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.325.1 78a796d8af0b652875a2fff886b4b5bd Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: -- <nvpair value="STREAMING|ASYNC" id="nodes-node2.cluster-p_pgsql-data-status"/> Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: ++ <nvpair id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status" value="STREAMING|SYNC"/> Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsql master score on node2.cluster : -INFINITY->100. Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsql-status on node2.cluster : HS:async->HS:sync. Oct 25 13:55:48 node1 pacemakerd[32701]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:49 node1 lrmd[26078]: notice: operation_finished: p_mysql_monitor_5000:444:stderr [ resource ms_MySQL is NOT running ] Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 141: notify p_mysql_post_notify_start_0 on node1.cluster (local) Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 142: notify p_mysql:1_post_notify_start_0 on node2.cluster Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2496, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:49 node1 crmd[26081]: notice: run_graph: Transition 0 (Complete=50, Pending=0, Fired=0, Skipped=5, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-547.bz2): Stopped Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move mysql_reader_vip#011(Started node1.cluster -> node2.cluster) Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move pgsql_reader_vip#011(Started node1.cluster -> node2.cluster) Oct 25 13:55:49 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-548.bz2 Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 22: stop mysql_reader_vip_stop_0 on node1.cluster (local) Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 27: stop pgsql_reader_vip_stop_0 on node1.cluster (local) Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 38: monitor p_mysql_monitor_2000 on node2.cluster Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 69: monitor p_pgsql_monitor_7000 on node2.cluster Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 106: monitor p_perl-fpm_monitor_10000 on node2.cluster Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 123: monitor p_radiusd_monitor_10000 on node2.cluster Oct 25 13:55:49 node1 IPaddr2(mysql_reader_vip)[512]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:49 node1 IPaddr2(pgsql_reader_vip)[513]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM operation mysql_reader_vip_stop_0 (call=2500, rc=0, cib-update=702, confirmed=true) ok Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM operation pgsql_reader_vip_stop_0 (call=2503, rc=0, cib-update=703, confirmed=true) ok Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 23: start mysql_reader_vip_start_0 on node2.cluster Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating action 28: start pgsql_reader_vip_start_0 on node2.cluster Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating action 24: monitor mysql_reader_vip_monitor_10000 on node2.cluster Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating action 29: monitor pgsql_reader_vip_monitor_10000 on node2.cluster Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 1 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-548.bz2): Complete Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:50 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-549.bz2 Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-549.bz2): Complete Oct 25 13:55:50 node1 crmd[26081]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:55:50 node1 pacemakerd[631]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #103 br0, 192.168.253.31#123, interface stats: received=0, sent=0, dropped=0, active_time=89 secs Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #101 br0, 192.168.253.63#123, interface stats: received=0, sent=0, dropped=0, active_time=42685 secs
Oct 25 13:55:51 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:52 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-550.bz2 Oct 25 13:55:52 node1 crmd[26081]: notice: run_graph: Transition 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-550.bz2): Complete Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:55:53 node1 pacemakerd[846]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:54 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-551.bz2 Oct 25 13:55:54 node1 crmd[26081]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-551.bz2): Complete Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:55:55 node1 pacemakerd[1123]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:56 node1 crmd[26081]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Move mysql_reader_vip#011(Started node2.cluster -> node1.cluster) Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-552.bz2 Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating action 26: stop mysql_reader_vip_stop_0 on node2.cluster Oct 25 13:55:56 node1 crmd[26081]: notice: run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Stopped Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Start mysql_reader_vip#011(node1.cluster) Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-553.bz2 Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating action 25: start mysql_reader_vip_start_0 on node1.cluster (local) Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Adding inet address 192.168.253.63/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Bringing device br0 up Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto not_used not_used Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM operation mysql_reader_vip_start_0 (call=2509, rc=0, cib-update=709, confirmed=true) ok Oct 25 13:55:57 node1 crmd[26081]: notice: te_rsc_command: Initiating action 26: monitor mysql_reader_vip_monitor_10000 on node1.cluster (local) Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM operation mysql_reader_vip_monitor_10000 (call=2512, rc=0, cib-update=710, confirmed=false) ok Oct 25 13:55:57 node1 crmd[26081]: notice: run_graph: Transition 6 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete Oct 25 13:55:57 node1 crmd[26081]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:55:57 node1 pacemakerd[1528]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:58 node1 ntpd[1526]: Listen normally on 105 br0 192.168.253.63 UDP 123
Oct 25 13:55:58 node1 ntpd[1526]: peers refreshed


You probably misuse "split brain" in this case. Split-brain means -
nodes lost communication with each other, so each node is unaware of
in which state resources on other node are. Here "nodes" means
corosync/pacemaker. Not individual resources.
I mean mysql database split brain - databases on nodes are out of sync.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to