26.10.2014 17:44, Andrei Borzenkov пишет:
В Sun, 26 Oct 2014 10:51:13 +0200
Andrew <ni...@seti.kr.ua> пишет:
26.10.2014 08:32, Andrei Borzenkov пишет:
В Sat, 25 Oct 2014 23:34:54 +0300
Andrew <ni...@seti.kr.ua> пишет:
25.10.2014 22:34, Digimer пишет:
On 25/10/14 03:32 PM, Andrew wrote:
Hi all.
I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).
Your logs do not show resource restart - they show pacemaker restart on
node2.
Yes, you're right. This was a pacemaker restart.
After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).
In 'show replication status' there is an error in table caused by unique
index dup.
So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?
Cause:
Logs?
ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
That seems too late. The real cause is that resource was reported as
being in master state on both nodes and this happened earlier.
This is a different resources (pgsql and mysql)/
Prevent:
Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.
"Split brain" == loss of communication. It does not matter whether
communication was lost because node failed or because daemon was not
running. There is no way for surviving node to know, *why*
communication was lost.
So how stonith will help in this case? Daemon will be restarted after
it's death if it occures during restart, and stonith will see alive
daemon...
So what is the easiest split-brain solution? Just to stop daemons, and
copy all mysql data from good node to bad one?
There is no split-brain visible in your log. Pacemaker on node2 was
restarted, cleanly as far as I can tell, and reintegrated back in
cluster. May be node1 "lost" node2, but that needs logs from node1.
Here is log from other node:
Oct 25 13:54:13 node1 pacemakerd[21773]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (56)
Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6993: master-p_mysql=56
Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (53)
Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6995: master-p_mysql=53
Oct 25 13:54:16 node1 pacemakerd[22035]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (60)
Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6997: master-p_mysql=60
Oct 25 13:54:18 node1 pacemakerd[22335]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node1 pacemakerd[22476]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node1 mysql(p_mysql)[22446]: INFO: Ignoring post-demote
notification execpt for my own demotion.
Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2423, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2425, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 pacemakerd[22597]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node1 mysql(p_mysql)[22540]: INFO: Ignoring post-demote
notification execpt for my own demotion.
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2433, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Adding inet
address 192.168.253.254/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Adding
inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2435, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Bringing device br0 up
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Bringing
device br0 up
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.254 br0 192.168.253.254
auto not_used not_used
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto
not_used not_used
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation ClusterIP_start_0 (call=2429, rc=0, cib-update=653,
confirmed=true) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_start_0 (call=2431, rc=0, cib-update=654,
confirmed=true) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation ClusterIP_monitor_2000 (call=2441, rc=0, cib-update=655,
confirmed=false) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_10000 (call=2443, rc=0,
cib-update=656, confirmed=false) ok
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: post-demote
notification for node2.cluster.
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has
finished reading master binary log
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has
finished processing relay log
Oct 25 13:54:20 node1 pacemakerd[22948]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node1 pgsql(p_pgsql)[22937]: INFO: Changing
p_pgsql-data-status on node2.cluster : STREAMING|SYNC->DISCONNECT.
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2445, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Setup node2.cluster
into async mode.
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: server signaled
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Reload configuration
file.
Oct 25 13:54:21 node1 pacemakerd[23094]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2450, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2453, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 102 br0
192.168.253.254 UDP 123
Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 103 br0
192.168.253.31 UDP 123
Oct 25 13:54:22 node1 ntpd[1526]: peers refreshed
Oct 25 13:54:23 node1 pacemakerd[23336]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:25 node1 pacemakerd[23575]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:27 node1 pacemakerd[23851]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:29 node1 pacemakerd[24006]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:32 node1 pacemakerd[24205]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:34 node1 pacemakerd[24541]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:36 node1 pacemakerd[24705]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:38 node1 pacemakerd[24912]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:41 node1 pacemakerd[25150]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:43 node1 pacemakerd[25440]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:45 node1 pacemakerd[25641]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:47 node1 pacemakerd[25863]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:50 node1 pacemakerd[26016]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:52 node1 pacemakerd[26230]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:54 node1 pacemakerd[26563]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:56 node1 pacemakerd[26780]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:58 node1 pacemakerd[27008]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:01 node1 pacemakerd[27208]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:03 node1 pacemakerd[27540]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:05 node1 pacemakerd[27696]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:07 node1 pacemakerd[27958]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:10 node1 pacemakerd[28111]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:12 node1 pacemakerd[28310]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:14 node1 pacemakerd[28641]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:17 node1 pacemakerd[28812]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:19 node1 pacemakerd[29019]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:21 node1 pacemakerd[29219]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:23 node1 pacemakerd[29551]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:25 node1 pacemakerd[29706]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:28 node1 pacemakerd[29978]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:30 node1 pacemakerd[30177]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:32 node1 pacemakerd[30420]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:34 node1 pacemakerd[30708]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:37 node1 pacemakerd[30871]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2456, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2459, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 mysql(p_mysql)[31063]: WARNING: Attempted to unset
the replication master on an instance that is not configured as a
replication slave
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 7016: master-p_mysql=1060
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_promote_0 (call=2462, rc=0, cib-update=657,
confirmed=true) ok
Oct 25 13:55:38 node1 mysql(p_mysql)[31139]: INFO: This will be the new
master, ignoring post-promote notification.
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2465, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:31173:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_monitor_5000 (call=2468, rc=8, cib-update=658,
confirmed=false) master
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Adding
inet address 192.168.253.64/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Bringing
device br0 up
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.64 br0 192.168.253.64 auto
not_used not_used
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_writer_vip_start_0 (call=2470, rc=0, cib-update=659,
confirmed=true) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_writer_vip_monitor_10000 (call=2474, rc=0,
cib-update=660, confirmed=false) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: peer_update_callback: Our
peer on the DC is dead
Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ]
Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: --- 0.322.7
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.323.1
b52f4b73ea1a2704d36b65af7364a3f8
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="1.1.8-7.el6-394e906" id="cib-bootstrap-options-dc-version"/>
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-14.el6_5.3-368c726"/>
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (1000)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-master-baseline
(000000000700FB08)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (PRI)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:39 node1 pacemakerd[31397]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:40 node1 ntpd[1526]: Listen normally on 104 br0
192.168.253.64 UDP 123
Oct 25 13:55:40 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:41 node1 pacemakerd[31594]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (1000)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-master-baseline
(000000000700FB08)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (PRI)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_mysql:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_pgsql:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_nginx:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_perl-fpm:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move
p_web_ip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move
p_radius_ip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_radiusd:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-547.bz2
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 17: monitor ClusterIP_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 18: monitor mysql_reader_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 19: monitor mysql_writer_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 20: monitor pgsql_reader_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 21: monitor pgsql_writer_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 22: monitor p_mysql:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 23: monitor p_pgsql:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 24: monitor p_nginx:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 25: monitor p_perl-fpm:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: monitor p_web_ip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 27: monitor p_radius_ip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 28: monitor p_radiusd:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 140: notify p_mysql_pre_notify_start_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 151: notify p_pgsql_pre_notify_start_0 on node1.cluster (local)
Oct 25 13:55:43 node1 pacemakerd[31858]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2477, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2479, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 16: probe_complete probe_complete on node2.cluster - no waiting
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 44: start p_mysql:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 74: start p_pgsql:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 108: start p_nginx:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 121: stop p_web_ip_stop_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 124: stop p_radius_ip_stop_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 129: start p_radiusd:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 pacemakerd[32056]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node1 IPaddr2(p_radius_ip)[32032]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:55:43 node1 IPaddr2(p_web_ip)[32030]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_radius_ip_stop_0 (call=2487, rc=0, cib-update=699,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_web_ip_stop_0 (call=2484, rc=0, cib-update=700,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 125: start p_radius_ip_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 122: start p_web_ip_start_0 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 110: monitor p_nginx:1_monitor_30000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 109: monitor p_nginx:1_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 111: start p_perl-fpm:1_start_0 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 126: monitor p_radius_ip_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 123: monitor p_web_ip_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:31992:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #100 lo,
10.255.0.32#123, interface stats: received=0, sent=0, dropped=0,
active_time=44998 secs
Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #18 lo,
10.255.0.33#123, interface stats: received=0, sent=0, dropped=0,
active_time=47428 secs
Oct 25 13:55:45 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 152: notify p_pgsql_post_notify_start_0 on node1.cluster (local)
Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 153: notify p_pgsql:1_post_notify_start_0 on node2.cluster
Oct 25 13:55:45 node1 pacemakerd[32302]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing
p_pgsql-data-status on node2.cluster : DISCONNECT->STREAMING|ASYNC.
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: --- 0.323.42
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.324.1
5fa5332e25fae6e209da715c2ba76e73
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="DISCONNECT" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="STREAMING|ASYNC"/>
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Setup node2.cluster
into sync mode.
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: server signaled
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Reload configuration
file.
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing
p_pgsql-status on node2.cluster : HS:alone->HS:async.
Oct 25 13:55:45 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2493, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:46 node1 pacemakerd[32448]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing
p_pgsql-data-status on node2.cluster : STREAMING|ASYNC->STREAMING|SYNC.
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: --- 0.324.2
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.325.1
78a796d8af0b652875a2fff886b4b5bd
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="STREAMING|ASYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="STREAMING|SYNC"/>
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsql
master score on node2.cluster : -INFINITY->100.
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing
p_pgsql-status on node2.cluster : HS:async->HS:sync.
Oct 25 13:55:48 node1 pacemakerd[32701]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:49 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:444:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 141: notify p_mysql_post_notify_start_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 142: notify p_mysql:1_post_notify_start_0 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2496, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: run_graph: Transition 0
(Complete=50, Pending=0, Fired=0, Skipped=5, Incomplete=1,
Source=/var/lib/pacemaker/pengine/pe-input-547.bz2): Stopped
Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move
mysql_reader_vip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move
pgsql_reader_vip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:49 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-548.bz2
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 22: stop mysql_reader_vip_stop_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 27: stop pgsql_reader_vip_stop_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 38: monitor p_mysql_monitor_2000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 69: monitor p_pgsql_monitor_7000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 106: monitor p_perl-fpm_monitor_10000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 123: monitor p_radiusd_monitor_10000 on node2.cluster
Oct 25 13:55:49 node1 IPaddr2(mysql_reader_vip)[512]: INFO: IP status =
ok, IP_CIP=
Oct 25 13:55:49 node1 IPaddr2(pgsql_reader_vip)[513]: INFO: IP status =
ok, IP_CIP=
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_stop_0 (call=2500, rc=0, cib-update=702,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_stop_0 (call=2503, rc=0, cib-update=703,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 23: start mysql_reader_vip_start_0 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 28: start pgsql_reader_vip_start_0 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 24: monitor mysql_reader_vip_monitor_10000 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 29: monitor pgsql_reader_vip_monitor_10000 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 1
(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-548.bz2): Complete
Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:50 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-549.bz2
Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 2
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-549.bz2): Complete
Oct 25 13:55:50 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:50 node1 pacemakerd[631]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #103 br0,
192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,
active_time=89 secs
Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #101 br0,
192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,
active_time=42685 secs
Oct 25 13:55:51 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:52 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-550.bz2
Oct 25 13:55:52 node1 crmd[26081]: notice: run_graph: Transition 3
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-550.bz2): Complete
Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:53 node1 pacemakerd[846]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:54 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-551.bz2
Oct 25 13:55:54 node1 crmd[26081]: notice: run_graph: Transition 4
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-551.bz2): Complete
Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:55 node1 pacemakerd[1123]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:56 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Move
mysql_reader_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-552.bz2
Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: stop mysql_reader_vip_stop_0 on node2.cluster
Oct 25 13:55:56 node1 crmd[26081]: notice: run_graph: Transition 5
(Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Stopped
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Start
mysql_reader_vip#011(node1.cluster)
Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-553.bz2
Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 25: start mysql_reader_vip_start_0 on node1.cluster (local)
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Adding inet
address 192.168.253.63/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Bringing
device br0 up
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto
not_used not_used
Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_start_0 (call=2509, rc=0, cib-update=709,
confirmed=true) ok
Oct 25 13:55:57 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: monitor mysql_reader_vip_monitor_10000 on node1.cluster (local)
Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_10000 (call=2512, rc=0,
cib-update=710, confirmed=false) ok
Oct 25 13:55:57 node1 crmd[26081]: notice: run_graph: Transition 6
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Oct 25 13:55:57 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:57 node1 pacemakerd[1528]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:58 node1 ntpd[1526]: Listen normally on 105 br0
192.168.253.63 UDP 123
Oct 25 13:55:58 node1 ntpd[1526]: peers refreshed
You probably misuse "split brain" in this case. Split-brain means -
nodes lost communication with each other, so each node is unaware of
in which state resources on other node are. Here "nodes" means
corosync/pacemaker. Not individual resources.
I mean mysql database split brain - databases on nodes are out of sync.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org