Re: [Pacemaker] MySQL, Percona replication manager - split brain

Andrew Sat, 25 Oct 2014 13:38:53 -0700

25.10.2014 22:34, Digimer пишет:

On 25/10/14 03:32 PM, Andrew wrote:

Hi all.


I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).

After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).

In 'show replication status' there is an error in table caused by unique
index dup.

So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?


Cause:

Logs?

ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:13 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3459: /var/lib/pacemaker/pengine/pe-input-254.bz2Oct 25 13:54:13 node2 crmd[29248]: notice: run_graph: Transition 3459(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-254.bz2): CompleteOct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:54:14 node2 pacemakerd[11626]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:16 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3460: /var/lib/pacemaker/pengine/pe-input-255.bz2Oct 25 13:54:16 node2 crmd[29248]: notice: run_graph: Transition 3460(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-255.bz2): CompleteOct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:18 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3461: /var/lib/pacemaker/pengine/pe-input-256.bz2Oct 25 13:54:18 node2 crmd[29248]: notice: run_graph: Transition 3461(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-256.bz2): CompleteOct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Oct 25 13:54:19 node2 pacemaker: Signaling Pacemaker Cluster Manager toterminateOct 25 13:54:19 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:Shuting down PacemakerOct 25 13:54:19 node2 pacemakerd[29237]: notice: stop_child: Stoppingcrmd: Sent -15 to process 29248Oct 25 13:54:19 node2 crmd[29248]: notice: crm_shutdown: Requestingshutdown, upper limit is 1200000msOct 25 13:54:19 node2 crmd[29248]: notice: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWNorigin=crm_shutdown ]Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: shutdown (1414234459)

Oct 25 13:54:19 node2 pacemaker: Waiting for cluster services to unload

Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7041: shutdown=1414234459Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:19 node2 pengine[29247]: notice: stage6: Scheduling Nodenode2.cluster for shutdownOct 25 13:54:19 node2 pengine[29247]: notice: LogActions: MoveClusterIP#011(Started node2.cluster -> node1.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Movemysql_writer_vip#011(Started node2.cluster -> node1.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Movepgsql_reader_vip#011(Started node2.cluster -> node1.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Promotep_mysql:0#011(Slave -> Master node1.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Demotep_mysql:1#011(Master -> Stopped node2.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stopp_pgsql:1#011(node2.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stopp_nginx:1#011(node2.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stopp_perl-fpm:1#011(node2.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stopp_radiusd:1#011(node2.cluster)Oct 25 13:54:19 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3462: /var/lib/pacemaker/pengine/pe-input-257.bz2Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 24: stop ClusterIP_stop_0 on node2.cluster (local)Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 32: stop pgsql_reader_vip_stop_0 on node2.cluster (local)Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 2: cancel p_mysql_cancel_2000 on node1.clusterOct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 143: notify p_mysql_pre_notify_demote_0 on node1.clusterOct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 145: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 149: notify p_pgsql_pre_notify_stop_0 on node1.clusterOct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 151: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 105: stop p_perl-fpm_stop_0 on node2.cluster (local)Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 120: stop p_radiusd_stop_0 on node2.cluster (local)Oct 25 13:54:19 node2 lrmd[29245]: notice: operation_finished:p_perl-fpm_stop_0:12107:stderr [ grep: /var/run/perl-fpm/perl-fpm.pid:No such file or directory ]Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_perl-fpm_stop_0 (call=2537, rc=0, cib-update=4282,confirmed=true) okOct 25 13:54:19 node2 pacemakerd[12173]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:19 node2 mysql(p_mysql)[12101]: INFO: post-demotenotification for node2.clusterOct 25 13:54:19 node2 IPaddr2(pgsql_reader_vip)[12100]: INFO: IP status= ok, IP_CIP=Oct 25 13:54:19 node2 IPaddr(ClusterIP)[12099]: INFO: IP status = ok,IP_CIP=Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for192.168.253.31 on br0.Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2534, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for192.168.253.254 on br0.Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_stop_0 (call=2528, rc=0, cib-update=4283,confirmed=true) okOct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRMoperation ClusterIP_stop_0 (call=2525, rc=0, cib-update=4284,confirmed=true) okOct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2532, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_radiusd_stop_0 (call=2540, rc=0, cib-update=4285,confirmed=true) okOct 25 13:54:20 node2 crmd[29248]: notice: run_graph: Transition 3462(Complete=15, Pending=0, Fired=0, Skipped=29, Incomplete=18,Source=/var/lib/pacemaker/pengine/pe-input-257.bz2): StoppedOct 25 13:54:20 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:20 node2 pengine[29247]: notice: stage6: Scheduling Nodenode2.cluster for shutdownOct 25 13:54:20 node2 pengine[29247]: notice: LogActions: StartClusterIP#011(node1.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Movemysql_writer_vip#011(Started node2.cluster -> node1.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Startpgsql_reader_vip#011(node1.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Promotep_mysql:0#011(Slave -> Master node1.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Demotep_mysql:1#011(Master -> Stopped node2.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stopp_pgsql:1#011(node2.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stopp_nginx:1#011(node2.cluster)Oct 25 13:54:20 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3463: /var/lib/pacemaker/pengine/pe-input-258.bz2Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 19: start ClusterIP_start_0 on node1.clusterOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 26: start pgsql_reader_vip_start_0 on node1.clusterOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 134: notify p_mysql_pre_notify_demote_0 on node1.clusterOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 136: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 140: notify p_pgsql_pre_notify_stop_0 on node1.clusterOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 142: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 97: stop p_nginx_stop_0 on node2.cluster (local)Oct 25 13:54:20 node2 pacemakerd[12402]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:20 node2 mysql(p_mysql)[12371]: INFO: post-demotenotification for node2.clusterOct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=2552, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 64: stop p_pgsql_stop_0 on node2.cluster (local)Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2550, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 34: demote p_mysql_demote_0 on node2.cluster (local)Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 20: monitor ClusterIP_monitor_2000 on node1.clusterOct 25 13:54:20 node2 pacemakerd[12542]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 27: monitor pgsql_reader_vip_monitor_10000 on node1.clusterOct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (1)Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7043: master-p_mysql=1Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_demote_0 (call=2567, rc=0, cib-update=4287,confirmed=true) okOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 135: notify p_mysql_post_notify_demote_0 on node1.clusterOct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 137: notify p_mysql_post_notify_demote_0 on node2.cluster (local)Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_pgsql (-INFINITY)Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7045: master-p_pgsql=-INFINITYOct 25 13:54:20 node2 mysql(p_mysql)[12605]: INFO: Ignoring post-demotenotification for my own demotion.Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2571, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:20 node2 ntpd[7193]: Deleting interface #110 br0:0,192.168.253.254#123, interface stats: received=0, sent=0, dropped=0,active_time=47343 secsOct 25 13:54:20 node2 ntpd[7193]: Deleting interface #172 br0,192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,active_time=45246 secs

Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.320.2419

Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.321.117dd8c556d2c592e066c52d67b1a5b87Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: -- <nvpairvalue="STREAMING|SYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: ++ <nvpairid="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"value="DISCONNECT"/>

Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: Killing nginx PID 29997
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: nginx stopped.

Oct 25 13:54:21 node2 lrmd[29245]: notice: operation_finished:p_nginx_stop_0:12373:stderr [ /usr/lib/ocf/resource.d/heartbeat/nginx:line 467: kill: (29997) - No such process ]Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_nginx_stop_0 (call=2556, rc=0, cib-update=4288,confirmed=true) okOct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: waiting for server toshut down.... done server stopped

Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: PostgreSQL is down

Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Changingp_pgsql-status on node2.cluster : HS:sync->STOP.Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (STOP)Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7047: p_pgsql-status=STOPOct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Set all nodes intoasync mode.Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_pgsql_stop_0 (call=2562, rc=0, cib-update=4289,confirmed=true) okOct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 141: notify p_pgsql_post_notify_stop_0 on node1.clusterOct 25 13:54:21 node2 crmd[29248]: notice: run_graph: Transition 3463(Complete=28, Pending=0, Fired=0, Skipped=18, Incomplete=9,Source=/var/lib/pacemaker/pengine/pe-input-258.bz2): StoppedOct 25 13:54:21 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:54:21 node2 pengine[29247]: notice: stage6: Scheduling Nodenode2.cluster for shutdownOct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stopmysql_writer_vip#011(Started node1.cluster)Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Promotep_mysql:0#011(Slave -> Master node1.cluster)Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stopp_mysql:1#011(node2.cluster)Oct 25 13:54:21 node2 pengine[29247]: crit: stage8: Cannot shut downnode 'node2.cluster' because of mysql_writer_vip: blockedOct 25 13:54:21 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3464: /var/lib/pacemaker/pengine/pe-input-259.bz2Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 125: notify p_mysql_pre_notify_stop_0 on node1.clusterOct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 127: notify p_mysql_pre_notify_stop_0 on node2.cluster (local)Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=2576, rc=0, cib-update=0,confirmed=true) okOct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 32: stop p_mysql_stop_0 on node2.cluster (local)Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (<null>)Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sentdelete 7049: node=node2.cluster, attr=master-p_mysql, id=<n/a>,set=(null), section=statusOct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (0)Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7052: readable=0

Oct 25 13:54:25 node2 nginx(p_nginx)[12778]: INFO: nginx not running

Oct 25 13:54:25 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_nginx_monitor_10000 (call=2442, rc=7, cib-update=4291,confirmed=false) not runningOct 25 13:54:25 node2 crmd[29248]: warning: update_failcount: Updatingfailcount for p_nginx on node2.cluster after failed monitor: rc=7(update=value++, time=1414234465)Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: fail-count-p_nginx (1)Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7055: fail-count-p_nginx=1Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: last-failure-p_nginx (1414234465)Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7057: last-failure-p_nginx=1414234465

Oct 25 13:54:26 node2 nginx(p_nginx)[12844]: INFO: nginx not running
Oct 25 13:54:26 node2 nginx(p_nginx)[12845]: INFO: nginx not running

Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_nginx_monitor_10000 (call=333, rc=7, cib-update=4292,confirmed=false) not runningOct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_nginx_monitor_10000 (call=2469, rc=7, cib-update=4293,confirmed=false) not runningOct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updatingfailcount for p_nginx on node2.cluster after failed monitor: rc=7(update=value++, time=1414234466)Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: fail-count-p_nginx (2)Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updatingfailcount for p_nginx on node2.cluster after failed monitor: rc=7(update=value++, time=1414234466)Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7059: fail-count-p_nginx=2Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: last-failure-p_nginx (1414234466)Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7061: last-failure-p_nginx=1414234466Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: fail-count-p_nginx (3)Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7063: fail-count-p_nginx=3Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:Sending flush op to all hosts for: last-failure-p_nginx (1414234466)Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sentupdate 7065: last-failure-p_nginx=1414234466

Oct 25 13:54:35 node2 nginx(p_nginx)[13037]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13099]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13098]: INFO: nginx not running
Oct 25 13:54:45 node2 nginx(p_nginx)[13291]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13353]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13352]: INFO: nginx not running
Oct 25 13:54:55 node2 nginx(p_nginx)[13549]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13628]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13627]: INFO: nginx not running
Oct 25 13:55:05 node2 nginx(p_nginx)[13870]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13931]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13932]: INFO: nginx not running
Oct 25 13:55:15 node2 nginx(p_nginx)[14151]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14214]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14213]: INFO: nginx not running
Oct 25 13:55:25 node2 nginx(p_nginx)[14405]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14469]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14468]: INFO: nginx not running
Oct 25 13:55:35 node2 nginx(p_nginx)[14659]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14723]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14724]: INFO: nginx not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL stopped

Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRMoperation p_mysql_stop_0 (call=2579, rc=0, cib-update=4294,confirmed=true) okOct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 126: notify p_mysql_post_notify_stop_0 on node1.clusterOct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3464(Complete=10, Pending=0, Fired=0, Skipped=6, Incomplete=4,Source=/var/lib/pacemaker/pengine/pe-input-259.bz2): StoppedOct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Nodenode2.cluster for shutdownOct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Stopmysql_writer_vip#011(Started node1.cluster)Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Promotep_mysql:0#011(Slave -> Master node1.cluster)Oct 25 13:55:38 node2 pengine[29247]: crit: stage8: Cannot shut downnode 'node2.cluster' because of mysql_writer_vip: blockedOct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3465: /var/lib/pacemaker/pengine/pe-input-260.bz2Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 126: notify p_mysql_pre_notify_promote_0 on node1.clusterOct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 30: promote p_mysql_promote_0 on node1.cluster

Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.321.15

Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.322.192b30c7922072633f9dbc336e4f1a066Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: -- <nvpairvalue="192.168.253.4|mysqld-bin.000409|106"id="mysql_replication-p_mysql_REPL_INFO"/>Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: ++ <nvpairid="mysql_replication-p_mysql_REPL_INFO" name="p_mysql_REPL_INFO"value="192.168.253.5|mysqld-bin.000471|106"/>Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 127: notify p_mysql_post_notify_promote_0 on node1.clusterOct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3465(Complete=9, Pending=0, Fired=0, Skipped=1, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-260.bz2): StoppedOct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On lossof CCM Quorum: IgnoreOct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_pgsql:0 active in master mode on node1.clusterOct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operationmonitor found resource p_mysql:1 active in master mode on node2.clusterOct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Nodenode2.cluster for shutdownOct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Movemysql_writer_vip#011(Started node2.cluster -> node1.cluster)Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:Calculated Transition 3466: /var/lib/pacemaker/pengine/pe-input-261.bz2Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 21: stop mysql_writer_vip_stop_0 on node2.cluster (local)Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 32: monitor p_mysql_monitor_5000 on node1.clusterOct 25 13:55:38 node2 IPaddr2(mysql_writer_vip)[14851]: INFO: IP status= ok, IP_CIP=Oct 25 13:55:38 node2 avahi-daemon[6932]: Withdrawing address record for192.168.253.64 on br0.Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRMoperation mysql_writer_vip_stop_0 (call=2586, rc=0, cib-update=4300,confirmed=true) okOct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 22: start mysql_writer_vip_start_0 on node1.clusterOct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiatingaction 23: monitor mysql_writer_vip_monitor_10000 on node1.clusterOct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3466(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-261.bz2): CompleteOct 25 13:55:38 node2 crmd[29248]: notice: lrm_state_verify_stopped:Stopped 1 recurring operations at shutdown... waiting (0 ops remaining)Oct 25 13:55:38 node2 lrmd[29245]: warning: qb_ipcs_event_sendv:new_event_notification (29245-29248-6): Bad file descriptor (9)Oct 25 13:55:38 node2 lrmd[29245]: warning: send_client_notify:Notification of client crmd/d068e1e6-7c6e-48b0-bb86-49d35fe11673 failedOct 25 13:55:38 node2 crmd[29248]: notice: do_lrm_control:Disconnected from the LRMOct 25 13:55:38 node2 crmd[29248]: notice: terminate_cs_connection:Disconnecting from CorosyncOct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:Client crmd (conn=0x1afff00, async-conn=0x1afff00) leftOct 25 13:55:38 node2 cib[29243]: warning: qb_ipcs_event_sendv:new_event_notification (29243-29248-13): Broken pipe (32)Oct 25 13:55:38 node2 cib[29243]: warning: do_local_notify: A-Syncreply to crmd failed: No message of desired typeOct 25 13:55:38 node2 pacemakerd[29237]: warning: qb_ipcs_event_sendv:new_event_notification (29237-29248-14): Broken pipe (32)Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stoppingpengine: Sent -15 to process 29247Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.crmd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stoppingattrd: Sent -15 to process 29246

Oct 25 13:55:38 node2 attrd[29246]:   notice: main: Exiting...

Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:Client attrd (conn=0x1af3480, async-conn=0x1af3480) leftOct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stoppinglrmd: Sent -15 to process 29245Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stoppingstonith-ng: Sent -15 to process 29244Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:Client stonith-ng (conn=0x1af7800, async-conn=0x1af7800) leftOct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stoppingcib: Sent -15 to process 29243Oct 25 13:55:38 node2 cib[29243]: notice: terminate_cs_connection:Disconnecting from CorosyncOct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:Client cib (conn=0x1afbb80, async-conn=0x1afbb80) leftOct 25 13:55:38 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:Shutdown completeOct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.cib failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.attrd failed: ipc deliveryfailed (rc=-2)

Oct 25 13:55:39 node2 pacemaker: Starting Pacemaker Cluster Manager

Oct 25 13:55:39 node2 pacemakerd[14916]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 pacemakerd[14916]: notice: main: StartingPacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpagesagent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipcnagios corosync-plugin cmanOct 25 13:55:39 node2 pacemakerd[14916]: notice: get_node_name:Defaulting to uname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.stonith-ng failed: ipcdelivery failed (rc=-2)Oct 25 13:55:39 node2 cib[14922]: notice: crm_add_logfile: Additionallogging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 attrd[14925]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 cib[14922]: notice: main: Using legacy configlocation: /var/lib/heartbeat/crmOct 25 13:55:39 node2 attrd[14925]: notice: crm_cluster_connect:Connecting to cluster infrastructure: classic openais (with plugin)Oct 25 13:55:39 node2 pengine[14926]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_cluster_connect:Connecting to cluster infrastructure: classic openais (with plugin)Oct 25 13:55:39 node2 lrmd[14924]: notice: crm_add_logfile: Additionallogging available in /var/log/cluster/corosync.logOct 25 13:55:39 node2 crmd[14927]: notice: crm_add_logfile: Additionallogging available in /var/log/cluster/corosync.log

Oct 25 13:55:39 node2 crmd[14927]:   notice: main: CRM Git Version: 368c726

Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaultingto uname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Recorded connection 0x1af31c0 for attrd/0Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:Defaulting to uname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Recorded connection 0x1af7540 for stonith-ng/0Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaultingto uname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 cib[14922]: notice: crm_cluster_connect:Connecting to cluster infrastructure: classic openais (with plugin)Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:Defaulting to uname -n for the local classic openais (with plugin) node name

Oct 25 13:55:39 node2 attrd[14925]:   notice: main: Starting mainloop...

Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting touname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Recorded connection 0x1afb8c0 for cib/0Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Sending membership update 1608 to cibOct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting touname -n for the local classic openais (with plugin) node nameOct 25 13:55:39 node2 cib[14922]: notice: plugin_handle_membership:Membership 1608: quorum acquiredOct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:plugin_handle_membership: Node node2.cluster[83732672] - state is nowmember (was (null))Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:plugin_handle_membership: Node node1.cluster[100509888] - state is nowmember (was (null))Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] WARN:route_ais_message: Sending message to local.crmd failed: ipc deliveryfailed (rc=-2)Oct 25 13:55:40 node2 cib[14922]: notice: crm_ipc_prepare: Messageexceeds the configured ipc limit (51200 bytes), consider configuringPCMK_ipc_buffer to 161796 or higher to avoid compression overheadsOct 25 13:55:40 node2 crmd[14927]: notice: crm_cluster_connect:Connecting to cluster infrastructure: classic openais (with plugin)Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaultingto uname -n for the local classic openais (with plugin) node nameOct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Recorded connection 0x1affc40 for crmd/0Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:Sending membership update 1608 to crmdOct 25 13:55:40 node2 stonith-ng[14923]: notice: setup_cib: Watchingfor stonith topology changesOct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaultingto uname -n for the local classic openais (with plugin) node nameOct 25 13:55:40 node2 stonith-ng[14923]: notice: unpack_config: Onloss of CCM Quorum: IgnoreOct 25 13:55:40 node2 crmd[14927]: notice: plugin_handle_membership:Membership 1608: quorum acquiredOct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:plugin_handle_membership: Node node2.cluster[83732672] - state is nowmember (was (null))Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:plugin_handle_membership: Node node1.cluster[100509888] - state is nowmember (was (null))Oct 25 13:55:40 node2 crmd[14927]: notice: do_started: The local CRMis operationalOct 25 13:55:40 node2 crmd[14927]: notice: do_state_transition: Statetransition S_STARTING -> S_PENDING [ input=I_PENDINGcause=C_FSA_INTERNAL origin=do_started ]Oct 25 13:55:40 node2 ntpd[7193]: Deleting interface #76 br0,192.168.253.64#123, interface stats: received=0, sent=0, dropped=0,active_time=3555898 secsOct 25 13:55:42 node2 crmd[14927]: notice: do_state_transition: Statetransition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGEorigin=do_cl_join_finalize_respond ]Oct 25 13:55:42 node2 attrd[14925]: notice: attrd_local_callback:Sending full refresh (origin=crmd)Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_perl-fpm_monitor_0 (call=41, rc=7, cib-update=8,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radiusd_monitor_0 (call=54, rc=7, cib-update=9,confirmed=true) not runningOct 25 13:55:43 node2 pacemakerd[15105]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.log

Oct 25 13:55:43 node2 mysql(p_mysql)[14949]: INFO: MySQL is not running

Oct 25 13:55:43 node2 crmd[14927]: error: generic_get_metadata:Failed to retrieve meta-data for ocf:percona:mysqlOct 25 13:55:43 node2 crmd[14927]: warning: get_rsc_metadata: Nometadata found for mysql::ocf:percona: Input/output error (-5)Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_mysql_monitor_0 (call=26, rc=7, cib-update=10,confirmed=true) not runningOct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: Don't check/var/lib/pgsql/9.1/data/ during probe

Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: PostgreSQL is down
Oct 25 13:55:43 node2 nginx(p_nginx)[14956]: INFO: nginx not running

Oct 25 13:55:43 node2 pacemakerd[15368]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_pgsql_monitor_0 (call=31, rc=7, cib-update=11,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_nginx_monitor_0 (call=36, rc=7, cib-update=12,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_web_ip_monitor_0 (call=45, rc=7, cib-update=13,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation pgsql_writer_vip_monitor_0 (call=21, rc=7, cib-update=14,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation ClusterIP_monitor_0 (call=5, rc=7, cib-update=15,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radius_ip_monitor_0 (call=49, rc=7, cib-update=16,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation mysql_writer_vip_monitor_0 (call=13, rc=7, cib-update=17,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_monitor_0 (call=17, rc=7, cib-update=18,confirmed=true) not runningOct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRMoperation mysql_reader_vip_monitor_0 (call=9, rc=7, cib-update=19,confirmed=true) not runningOct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)Oct 25 13:55:43 node2 pacemakerd[15481]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (0)Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (0)

Oct 25 13:55:43 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running

Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: nginx: theconfiguration file /etc/nginx/nginx.conf syntax is ok nginx:configuration file /etc/nginx/nginx.conf test is successfulOct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: Starting/usr/sbin/nginx - nginx version: nginx/1.0.15Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: /usr/sbin/nginx buildconfiguration: configure arguments: --prefix=/usr/share/nginx--sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf--error-log-path=/var/log/nginx/error.log--http-log-path=/var/log/nginx/access.log--http-client-body-temp-path=/var/lib/nginx/tmp/client_body--http-proxy-temp-path=/var/lib/nginx/tmp/proxy--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi--http-scgi-temp-path=/var/lib/nginx/tmp/scgi--pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx--user=nginx --group=nginx --with-file-aio --with-ipv6--with-http_ssl_module --with-http_realip_module--with-http_addition_module --with-http_xslt_module--with-http_image_filter_module --with-http_geoip_module--with-http_sub_module --with-http_dav_module --with-http_flv_module--with-http_mp4_module --with-http_gzip_static_module--with-http_random_index_module --with-http_secure_link_module--with-http_degradation_module --with-http_stub_status_module--with-http_perl_module --with-mail --with-mail_ssl_module--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'--with-ld-opt=-Wl,-EOct 25 13:55:43 node2 pgsql(p_pgsql)[15439]: INFO: Changingp_pgsql-status on node2.cluster : ->STOP.Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_nginx_start_0 (call=72, rc=0, cib-update=20, confirmed=true) okOct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (STOP)Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Adding inetaddress 10.255.0.33/32 to device loOct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Adding inetaddress 10.255.0.32/32 to device loOct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Bringing devicelo up

Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Bringing device lo up

Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radius_ip_start_0 (call=76, rc=0, cib-update=21,confirmed=true) okOct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_web_ip_start_0 (call=78, rc=0, cib-update=22, confirmed=true) okOct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_pgsql (-INFINITY)Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_nginx_monitor_30000 (call=81, rc=0, cib-update=23,confirmed=false) okOct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: Set all nodes intoasync mode.Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radius_ip_monitor_10000 (call=89, rc=0, cib-update=24,confirmed=false) okOct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_web_ip_monitor_10000 (call=91, rc=0, cib-update=25,confirmed=false) ok

Oct 25 13:55:44 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running

Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_nginx_monitor_10000 (call=83, rc=0, cib-update=26,confirmed=false) ok

Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: server starting

Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL startcommand sent.Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Can't getPostgreSQL recovery status. rc=2Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Connection error(connection to the server went bad and the session was not interactive)occurred while executing the psql command.Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 4: master-p_mysql=0Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 7: master-p_pgsql=-INFINITYOct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 11: readable=0Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 17: p_pgsql-status=STOPOct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 20: probe_complete=trueOct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_perl-fpm_start_0 (call=85, rc=0, cib-update=27,confirmed=true) okOct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radiusd_start_0 (call=74, rc=0, cib-update=28,confirmed=true) ok

Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL is started.

Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: Changingp_pgsql-status on node2.cluster : STOP->HS:alone.Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (HS:alone)Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 22: p_pgsql-status=HS:aloneOct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:p_pgsql_start_0:15439:stderr [ psql: could not connect to server: Nosuch file or directory ]Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:p_pgsql_start_0:15439:stderr [ #011Is the server running locally andaccepting ]Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:p_pgsql_start_0:15439:stderr [ #011connections on Unix domain socket"/tmp/.s.PGSQL.5432"? ]Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_pgsql_start_0 (call=70, rc=0, cib-update=29, confirmed=true) okOct 25 13:55:45 node2 pacemakerd[16158]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_pgsql_notify_0 (call=100, rc=0, cib-update=0, confirmed=true) okOct 25 13:55:45 node2 ntpd[7193]: Listening on interface #174 lo,10.255.0.33#123 EnabledOct 25 13:55:45 node2 ntpd[7193]: Listening on interface #175 lo,10.255.0.32#123 EnabledOct 25 13:55:45 node2 attrd[14925]: notice: attrd_cs_dispatch: Updaterelayed from node1.clusterOct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (HS:async)Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 24: p_pgsql-status=HS:asyncOct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Updaterelayed from node1.clusterOct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_pgsql (100)Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 26: master-p_pgsql=100Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Updaterelayed from node1.clusterOct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: p_pgsql-status (HS:sync)Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 28: p_pgsql-status=HS:syncOct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Changing MySQLconfiguration to replicate from node1.cluster.Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: ERROR: check_slave invokedon an instance that is not a replication slave.Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Restored master posfor 192.168.253.5 : mysqld-bin.000471:106Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (1)Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 30: master-p_mysql=1Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (60)Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 32: master-p_mysql=60Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (1)Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 34: readable=1

Oct 25 13:55:49 node2 mysql(p_mysql)[15438]: INFO: MySQL started

Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:p_mysql_start_0:15438:stderr [ Error performing operation: No suchdevice or address ]Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:p_mysql_start_0:15438:stderr [ resource ms_MySQL is NOT running ]Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_mysql_start_0 (call=68, rc=0, cib-update=30, confirmed=true) okOct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_mysql_notify_0 (call=104, rc=0, cib-update=0, confirmed=true) okOct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_perl-fpm_monitor_10000 (call=111, rc=0, cib-update=31,confirmed=false) okOct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_radiusd_monitor_10000 (call=113, rc=0, cib-update=32,confirmed=false) okOct 25 13:55:49 node2 pacemakerd[16380]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_pgsql_monitor_7000 (call=109, rc=0, cib-update=33,confirmed=false) okOct 25 13:55:49 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Addinginet address 192.168.253.63/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:55:49 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Addinginet address 192.168.253.31/24 with broadcast address 192.168.253.255 todevice br0Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address recordfor 192.168.253.63 on br0.IPv4.Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address recordfor 192.168.253.31 on br0.IPv4.Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Bringingdevice br0 upOct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Bringingdevice br0 upOct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 autonot_used not_usedOct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO:/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 autonot_used not_usedOct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRMoperation mysql_reader_vip_start_0 (call=117, rc=0, cib-update=34,confirmed=true) okOct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_start_0 (call=119, rc=0, cib-update=35,confirmed=true) okOct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRMoperation mysql_reader_vip_monitor_10000 (call=124, rc=0, cib-update=36,confirmed=false) okOct 25 13:55:50 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (52)Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 36: master-p_mysql=52Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRMoperation pgsql_reader_vip_monitor_10000 (call=126, rc=0, cib-update=37,confirmed=false) okOct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRMoperation p_mysql_monitor_2000 (call=107, rc=0, cib-update=38,confirmed=false) okOct 25 13:55:51 node2 ntpd[7193]: Listening on interface #176 br0,192.168.253.63#123 EnabledOct 25 13:55:51 node2 ntpd[7193]: Listening on interface #177 br0,192.168.253.31#123 EnabledOct 25 13:55:52 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (50)Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 38: master-p_mysql=50Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (51)Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 40: master-p_mysql=51Oct 25 13:55:56 node2 mysql(p_mysql)[17074]: ERROR: MySQL instanceconfigured for replication, but replication has failed.Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: readable (0)Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 42: readable=0Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p_mysql (0)Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sentupdate 44: master-p_mysql=0Oct 25 13:55:56 node2 IPaddr2(mysql_reader_vip)[17141]: INFO: IP status= ok, IP_CIP=Oct 25 13:55:56 node2 avahi-daemon[6932]: Withdrawing address record for192.168.253.63 on br0.Oct 25 13:55:56 node2 crmd[14927]: notice: process_lrm_event: LRMoperation mysql_reader_vip_stop_0 (call=132, rc=0, cib-update=39,confirmed=true) okOct 25 13:55:57 node2 pacemakerd[17206]: notice: crm_add_logfile:Additional logging available in /var/log/cluster/corosync.logOct 25 13:55:58 node2 ntpd[7193]: Deleting interface #176 br0,192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,active_time=7 secsOct 25 13:55:58 node2 mysql(p_mysql)[17267]: ERROR: MySQL instanceconfigured for replication, but replication has failed.


Prevent:

Fencing (aka stonith). This is why fencing is required.

No node failure. Just daemon was restarted.

2) How to resolve split brain state? Is it enough just to wait for
failure, then - restart mysql by hand and clean row with dup index in
slave db, and then run resource again? Or there is some automation for
such cases?

How are you sharing data? Can you give us a better understanding ofyour setup?

Semi-synchronous MySQL replication, if you mean sharing DB log betweennodes.


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] MySQL, Percona replication manager - split brain

Reply via email to