25.10.2014 22:34, Digimer пишет:
On 25/10/14 03:32 PM, Andrew wrote:
Hi all.

I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).

After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).

In 'show replication status' there is an error in table caused by unique
index dup.

So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?

Cause:

Logs?
ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:13 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3459: /var/lib/pacemaker/pengine/pe-input-254.bz2 Oct 25 13:54:13 node2 crmd[29248]: notice: run_graph: Transition 3459 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-254.bz2): Complete Oct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:54:14 node2 pacemakerd[11626]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:16 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3460: /var/lib/pacemaker/pengine/pe-input-255.bz2 Oct 25 13:54:16 node2 crmd[29248]: notice: run_graph: Transition 3460 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-255.bz2): Complete Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:18 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3461: /var/lib/pacemaker/pengine/pe-input-256.bz2 Oct 25 13:54:18 node2 crmd[29248]: notice: run_graph: Transition 3461 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-256.bz2): Complete Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 25 13:54:19 node2 pacemaker: Signaling Pacemaker Cluster Manager to terminate Oct 25 13:54:19 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker: Shuting down Pacemaker Oct 25 13:54:19 node2 pacemakerd[29237]: notice: stop_child: Stopping crmd: Sent -15 to process 29248 Oct 25 13:54:19 node2 crmd[29248]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms Oct 25 13:54:19 node2 crmd[29248]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ] Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1414234459)
Oct 25 13:54:19 node2 pacemaker: Waiting for cluster services to unload
Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7041: shutdown=1414234459 Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:19 node2 pengine[29247]: notice: stage6: Scheduling Node node2.cluster for shutdown Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move ClusterIP#011(Started node2.cluster -> node1.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move mysql_writer_vip#011(Started node2.cluster -> node1.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move pgsql_reader_vip#011(Started node2.cluster -> node1.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Promote p_mysql:0#011(Slave -> Master node1.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Demote p_mysql:1#011(Master -> Stopped node2.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop p_pgsql:1#011(node2.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop p_nginx:1#011(node2.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop p_perl-fpm:1#011(node2.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop p_radiusd:1#011(node2.cluster) Oct 25 13:54:19 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3462: /var/lib/pacemaker/pengine/pe-input-257.bz2 Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 24: stop ClusterIP_stop_0 on node2.cluster (local) Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 32: stop pgsql_reader_vip_stop_0 on node2.cluster (local) Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 2: cancel p_mysql_cancel_2000 on node1.cluster Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 143: notify p_mysql_pre_notify_demote_0 on node1.cluster Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 145: notify p_mysql_pre_notify_demote_0 on node2.cluster (local) Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 149: notify p_pgsql_pre_notify_stop_0 on node1.cluster Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 151: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local) Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 105: stop p_perl-fpm_stop_0 on node2.cluster (local) Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating action 120: stop p_radiusd_stop_0 on node2.cluster (local) Oct 25 13:54:19 node2 lrmd[29245]: notice: operation_finished: p_perl-fpm_stop_0:12107:stderr [ grep: /var/run/perl-fpm/perl-fpm.pid: No such file or directory ] Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_perl-fpm_stop_0 (call=2537, rc=0, cib-update=4282, confirmed=true) ok Oct 25 13:54:19 node2 pacemakerd[12173]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:19 node2 mysql(p_mysql)[12101]: INFO: post-demote notification for node2.cluster Oct 25 13:54:19 node2 IPaddr2(pgsql_reader_vip)[12100]: INFO: IP status = ok, IP_CIP= Oct 25 13:54:19 node2 IPaddr(ClusterIP)[12099]: INFO: IP status = ok, IP_CIP= Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for 192.168.253.31 on br0. Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2534, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for 192.168.253.254 on br0. Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM operation pgsql_reader_vip_stop_0 (call=2528, rc=0, cib-update=4283, confirmed=true) ok Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM operation ClusterIP_stop_0 (call=2525, rc=0, cib-update=4284, confirmed=true) ok Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2532, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_radiusd_stop_0 (call=2540, rc=0, cib-update=4285, confirmed=true) ok Oct 25 13:54:20 node2 crmd[29248]: notice: run_graph: Transition 3462 (Complete=15, Pending=0, Fired=0, Skipped=29, Incomplete=18, Source=/var/lib/pacemaker/pengine/pe-input-257.bz2): Stopped Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:20 node2 pengine[29247]: notice: stage6: Scheduling Node node2.cluster for shutdown Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start ClusterIP#011(node1.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Move mysql_writer_vip#011(Started node2.cluster -> node1.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start pgsql_reader_vip#011(node1.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Promote p_mysql:0#011(Slave -> Master node1.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Demote p_mysql:1#011(Master -> Stopped node2.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop p_pgsql:1#011(node2.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop p_nginx:1#011(node2.cluster) Oct 25 13:54:20 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3463: /var/lib/pacemaker/pengine/pe-input-258.bz2 Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 19: start ClusterIP_start_0 on node1.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 26: start pgsql_reader_vip_start_0 on node1.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 134: notify p_mysql_pre_notify_demote_0 on node1.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 136: notify p_mysql_pre_notify_demote_0 on node2.cluster (local) Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 140: notify p_pgsql_pre_notify_stop_0 on node1.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 142: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local) Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 97: stop p_nginx_stop_0 on node2.cluster (local) Oct 25 13:54:20 node2 pacemakerd[12402]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:20 node2 mysql(p_mysql)[12371]: INFO: post-demote notification for node2.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=2552, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 64: stop p_pgsql_stop_0 on node2.cluster (local) Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2550, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 34: demote p_mysql_demote_0 on node2.cluster (local) Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 20: monitor ClusterIP_monitor_2000 on node1.cluster Oct 25 13:54:20 node2 pacemakerd[12542]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 27: monitor pgsql_reader_vip_monitor_10000 on node1.cluster Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (1) Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7043: master-p_mysql=1 Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_demote_0 (call=2567, rc=0, cib-update=4287, confirmed=true) ok Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 135: notify p_mysql_post_notify_demote_0 on node1.cluster Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating action 137: notify p_mysql_post_notify_demote_0 on node2.cluster (local) Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_pgsql (-INFINITY) Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7045: master-p_pgsql=-INFINITY Oct 25 13:54:20 node2 mysql(p_mysql)[12605]: INFO: Ignoring post-demote notification for my own demotion. Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2571, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #110 br0:0, 192.168.253.254#123, interface stats: received=0, sent=0, dropped=0, active_time=47343 secs Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #172 br0, 192.168.253.31#123, interface stats: received=0, sent=0, dropped=0, active_time=45246 secs
Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.320.2419
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.321.1 17dd8c556d2c592e066c52d67b1a5b87 Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: -- <nvpair value="STREAMING|SYNC" id="nodes-node2.cluster-p_pgsql-data-status"/> Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: ++ <nvpair id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status" value="DISCONNECT"/>
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: Killing nginx PID 29997
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: nginx stopped.
Oct 25 13:54:21 node2 lrmd[29245]: notice: operation_finished: p_nginx_stop_0:12373:stderr [ /usr/lib/ocf/resource.d/heartbeat/nginx: line 467: kill: (29997) - No such process ] Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_nginx_stop_0 (call=2556, rc=0, cib-update=4288, confirmed=true) ok Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: waiting for server to shut down.... done server stopped
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: PostgreSQL is down
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Changing p_pgsql-status on node2.cluster : HS:sync->STOP. Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (STOP) Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7047: p_pgsql-status=STOP Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Set all nodes into async mode. Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_pgsql_stop_0 (call=2562, rc=0, cib-update=4289, confirmed=true) ok Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating action 141: notify p_pgsql_post_notify_stop_0 on node1.cluster Oct 25 13:54:21 node2 crmd[29248]: notice: run_graph: Transition 3463 (Complete=28, Pending=0, Fired=0, Skipped=18, Incomplete=9, Source=/var/lib/pacemaker/pengine/pe-input-258.bz2): Stopped Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:54:21 node2 pengine[29247]: notice: stage6: Scheduling Node node2.cluster for shutdown Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop mysql_writer_vip#011(Started node1.cluster) Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Promote p_mysql:0#011(Slave -> Master node1.cluster) Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop p_mysql:1#011(node2.cluster) Oct 25 13:54:21 node2 pengine[29247]: crit: stage8: Cannot shut down node 'node2.cluster' because of mysql_writer_vip: blocked Oct 25 13:54:21 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3464: /var/lib/pacemaker/pengine/pe-input-259.bz2 Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating action 125: notify p_mysql_pre_notify_stop_0 on node1.cluster Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating action 127: notify p_mysql_pre_notify_stop_0 on node2.cluster (local) Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=2576, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating action 32: stop p_mysql_stop_0 on node2.cluster (local) Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (<null>) Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent delete 7049: node=node2.cluster, attr=master-p_mysql, id=<n/a>, set=(null), section=status Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (0) Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7052: readable=0
Oct 25 13:54:25 node2 nginx(p_nginx)[12778]: INFO: nginx not running
Oct 25 13:54:25 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_nginx_monitor_10000 (call=2442, rc=7, cib-update=4291, confirmed=false) not running Oct 25 13:54:25 node2 crmd[29248]: warning: update_failcount: Updating failcount for p_nginx on node2.cluster after failed monitor: rc=7 (update=value++, time=1414234465) Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_nginx (1) Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7055: fail-count-p_nginx=1 Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_nginx (1414234465) Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7057: last-failure-p_nginx=1414234465
Oct 25 13:54:26 node2 nginx(p_nginx)[12844]: INFO: nginx not running
Oct 25 13:54:26 node2 nginx(p_nginx)[12845]: INFO: nginx not running
Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_nginx_monitor_10000 (call=333, rc=7, cib-update=4292, confirmed=false) not running Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_nginx_monitor_10000 (call=2469, rc=7, cib-update=4293, confirmed=false) not running Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating failcount for p_nginx on node2.cluster after failed monitor: rc=7 (update=value++, time=1414234466) Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_nginx (2) Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating failcount for p_nginx on node2.cluster after failed monitor: rc=7 (update=value++, time=1414234466) Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7059: fail-count-p_nginx=2 Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_nginx (1414234466) Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7061: last-failure-p_nginx=1414234466 Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_nginx (3) Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7063: fail-count-p_nginx=3 Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_nginx (1414234466) Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent update 7065: last-failure-p_nginx=1414234466
Oct 25 13:54:35 node2 nginx(p_nginx)[13037]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13099]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13098]: INFO: nginx not running
Oct 25 13:54:45 node2 nginx(p_nginx)[13291]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13353]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13352]: INFO: nginx not running
Oct 25 13:54:55 node2 nginx(p_nginx)[13549]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13628]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13627]: INFO: nginx not running
Oct 25 13:55:05 node2 nginx(p_nginx)[13870]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13931]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13932]: INFO: nginx not running
Oct 25 13:55:15 node2 nginx(p_nginx)[14151]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14214]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14213]: INFO: nginx not running
Oct 25 13:55:25 node2 nginx(p_nginx)[14405]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14469]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14468]: INFO: nginx not running
Oct 25 13:55:35 node2 nginx(p_nginx)[14659]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14723]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14724]: INFO: nginx not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL stopped
Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM operation p_mysql_stop_0 (call=2579, rc=0, cib-update=4294, confirmed=true) ok Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 126: notify p_mysql_post_notify_stop_0 on node1.cluster Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3464 (Complete=10, Pending=0, Fired=0, Skipped=6, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-259.bz2): Stopped Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op: Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7) Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node node2.cluster for shutdown Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Stop mysql_writer_vip#011(Started node1.cluster) Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Promote p_mysql:0#011(Slave -> Master node1.cluster) Oct 25 13:55:38 node2 pengine[29247]: crit: stage8: Cannot shut down node 'node2.cluster' because of mysql_writer_vip: blocked Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3465: /var/lib/pacemaker/pengine/pe-input-260.bz2 Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 126: notify p_mysql_pre_notify_promote_0 on node1.cluster Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 30: promote p_mysql_promote_0 on node1.cluster
Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.321.15
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.322.1 92b30c7922072633f9dbc336e4f1a066 Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: -- <nvpair value="192.168.253.4|mysqld-bin.000409|106" id="mysql_replication-p_mysql_REPL_INFO"/> Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: ++ <nvpair id="mysql_replication-p_mysql_REPL_INFO" name="p_mysql_REPL_INFO" value="192.168.253.5|mysqld-bin.000471|106"/> Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 127: notify p_mysql_post_notify_promote_0 on node1.cluster Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3465 (Complete=9, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-260.bz2): Stopped Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_pgsql:0 active in master mode on node1.cluster Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation monitor found resource p_mysql:1 active in master mode on node2.cluster Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op: Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7) Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node node2.cluster for shutdown Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Move mysql_writer_vip#011(Started node2.cluster -> node1.cluster) Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message: Calculated Transition 3466: /var/lib/pacemaker/pengine/pe-input-261.bz2 Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 21: stop mysql_writer_vip_stop_0 on node2.cluster (local) Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 32: monitor p_mysql_monitor_5000 on node1.cluster Oct 25 13:55:38 node2 IPaddr2(mysql_writer_vip)[14851]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:38 node2 avahi-daemon[6932]: Withdrawing address record for 192.168.253.64 on br0. Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM operation mysql_writer_vip_stop_0 (call=2586, rc=0, cib-update=4300, confirmed=true) ok Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 22: start mysql_writer_vip_start_0 on node1.cluster Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating action 23: monitor mysql_writer_vip_monitor_10000 on node1.cluster Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3466 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-261.bz2): Complete Oct 25 13:55:38 node2 crmd[29248]: notice: lrm_state_verify_stopped: Stopped 1 recurring operations at shutdown... waiting (0 ops remaining) Oct 25 13:55:38 node2 lrmd[29245]: warning: qb_ipcs_event_sendv: new_event_notification (29245-29248-6): Bad file descriptor (9) Oct 25 13:55:38 node2 lrmd[29245]: warning: send_client_notify: Notification of client crmd/d068e1e6-7c6e-48b0-bb86-49d35fe11673 failed Oct 25 13:55:38 node2 crmd[29248]: notice: do_lrm_control: Disconnected from the LRM Oct 25 13:55:38 node2 crmd[29248]: notice: terminate_cs_connection: Disconnecting from Corosync Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1afff00, async-conn=0x1afff00) left Oct 25 13:55:38 node2 cib[29243]: warning: qb_ipcs_event_sendv: new_event_notification (29243-29248-13): Broken pipe (32) Oct 25 13:55:38 node2 cib[29243]: warning: do_local_notify: A-Sync reply to crmd failed: No message of desired type Oct 25 13:55:38 node2 pacemakerd[29237]: warning: qb_ipcs_event_sendv: new_event_notification (29237-29248-14): Broken pipe (32) Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping pengine: Sent -15 to process 29247 Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping attrd: Sent -15 to process 29246
Oct 25 13:55:38 node2 attrd[29246]:   notice: main: Exiting...
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit: Client attrd (conn=0x1af3480, async-conn=0x1af3480) left Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping lrmd: Sent -15 to process 29245 Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 29244 Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x1af7800, async-conn=0x1af7800) left Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping cib: Sent -15 to process 29243 Oct 25 13:55:38 node2 cib[29243]: notice: terminate_cs_connection: Disconnecting from Corosync Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x1afbb80, async-conn=0x1afbb80) left Oct 25 13:55:38 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker: Shutdown complete Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: ipc delivery failed (rc=-2)
Oct 25 13:55:39 node2 pacemaker: Starting Pacemaker Cluster Manager
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 pacemakerd[14916]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Oct 25 13:55:39 node2 pacemakerd[14916]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Oct 25 13:55:39 node2 cib[14922]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 attrd[14925]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 cib[14922]: notice: main: Using legacy config location: /var/lib/heartbeat/crm Oct 25 13:55:39 node2 attrd[14925]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Oct 25 13:55:39 node2 pengine[14926]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Oct 25 13:55:39 node2 lrmd[14924]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:39 node2 crmd[14927]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]:   notice: main: CRM Git Version: 368c726
Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1af31c0 for attrd/0 Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1af7540 for stonith-ng/0 Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 cib[14922]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 attrd[14925]:   notice: main: Starting mainloop...
Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1afb8c0 for cib/0 Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Sending membership update 1608 to cib Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:39 node2 cib[14922]: notice: plugin_handle_membership: Membership 1608: quorum acquired Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state: plugin_handle_membership: Node node2.cluster[83732672] - state is now member (was (null)) Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state: plugin_handle_membership: Node node1.cluster[100509888] - state is now member (was (null)) Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Oct 25 13:55:40 node2 cib[14922]: notice: crm_ipc_prepare: Message exceeds the configured ipc limit (51200 bytes), consider configuring PCMK_ipc_buffer to 161796 or higher to avoid compression overheads Oct 25 13:55:40 node2 crmd[14927]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1affc40 for crmd/0 Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc: Sending membership update 1608 to crmd Oct 25 13:55:40 node2 stonith-ng[14923]: notice: setup_cib: Watching for stonith topology changes Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Oct 25 13:55:40 node2 stonith-ng[14923]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 25 13:55:40 node2 crmd[14927]: notice: plugin_handle_membership: Membership 1608: quorum acquired Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state: plugin_handle_membership: Node node2.cluster[83732672] - state is now member (was (null)) Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state: plugin_handle_membership: Node node1.cluster[100509888] - state is now member (was (null)) Oct 25 13:55:40 node2 crmd[14927]: notice: do_started: The local CRM is operational Oct 25 13:55:40 node2 crmd[14927]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Oct 25 13:55:40 node2 ntpd[7193]: Deleting interface #76 br0, 192.168.253.64#123, interface stats: received=0, sent=0, dropped=0, active_time=3555898 secs Oct 25 13:55:42 node2 crmd[14927]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Oct 25 13:55:42 node2 attrd[14925]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_perl-fpm_monitor_0 (call=41, rc=7, cib-update=8, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radiusd_monitor_0 (call=54, rc=7, cib-update=9, confirmed=true) not running Oct 25 13:55:43 node2 pacemakerd[15105]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 mysql(p_mysql)[14949]: INFO: MySQL is not running
Oct 25 13:55:43 node2 crmd[14927]: error: generic_get_metadata: Failed to retrieve meta-data for ocf:percona:mysql Oct 25 13:55:43 node2 crmd[14927]: warning: get_rsc_metadata: No metadata found for mysql::ocf:percona: Input/output error (-5) Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_mysql_monitor_0 (call=26, rc=7, cib-update=10, confirmed=true) not running Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: Don't check /var/lib/pgsql/9.1/data/ during probe
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: PostgreSQL is down
Oct 25 13:55:43 node2 nginx(p_nginx)[14956]: INFO: nginx not running
Oct 25 13:55:43 node2 pacemakerd[15368]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_pgsql_monitor_0 (call=31, rc=7, cib-update=11, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_nginx_monitor_0 (call=36, rc=7, cib-update=12, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_web_ip_monitor_0 (call=45, rc=7, cib-update=13, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation pgsql_writer_vip_monitor_0 (call=21, rc=7, cib-update=14, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation ClusterIP_monitor_0 (call=5, rc=7, cib-update=15, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radius_ip_monitor_0 (call=49, rc=7, cib-update=16, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation mysql_writer_vip_monitor_0 (call=13, rc=7, cib-update=17, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation pgsql_reader_vip_monitor_0 (call=17, rc=7, cib-update=18, confirmed=true) not running Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM operation mysql_reader_vip_monitor_0 (call=9, rc=7, cib-update=19, confirmed=true) not running Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Oct 25 13:55:43 node2 pacemakerd[15481]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (0) Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:43 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok nginx: configuration file /etc/nginx/nginx.conf test is successful Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: Starting /usr/sbin/nginx - nginx version: nginx/1.0.15 Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: /usr/sbin/nginx build configuration: configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-http_perl_module --with-mail --with-mail_ssl_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' --with-ld-opt=-Wl,-E Oct 25 13:55:43 node2 pgsql(p_pgsql)[15439]: INFO: Changing p_pgsql-status on node2.cluster : ->STOP. Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_nginx_start_0 (call=72, rc=0, cib-update=20, confirmed=true) ok Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (STOP) Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Adding inet address 10.255.0.33/32 to device lo Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Adding inet address 10.255.0.32/32 to device lo Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Bringing device lo up
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Bringing device lo up
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radius_ip_start_0 (call=76, rc=0, cib-update=21, confirmed=true) ok Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_web_ip_start_0 (call=78, rc=0, cib-update=22, confirmed=true) ok Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_pgsql (-INFINITY) Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_nginx_monitor_30000 (call=81, rc=0, cib-update=23, confirmed=false) ok Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: Set all nodes into async mode. Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radius_ip_monitor_10000 (call=89, rc=0, cib-update=24, confirmed=false) ok Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_web_ip_monitor_10000 (call=91, rc=0, cib-update=25, confirmed=false) ok
Oct 25 13:55:44 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_nginx_monitor_10000 (call=83, rc=0, cib-update=26, confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: server starting
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL start command sent. Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Can't get PostgreSQL recovery status. rc=2 Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Connection error (connection to the server went bad and the session was not interactive) occurred while executing the psql command. Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent update 4: master-p_mysql=0 Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent update 7: master-p_pgsql=-INFINITY Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent update 11: readable=0 Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent update 17: p_pgsql-status=STOP Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent update 20: probe_complete=true Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_perl-fpm_start_0 (call=85, rc=0, cib-update=27, confirmed=true) ok Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radiusd_start_0 (call=74, rc=0, cib-update=28, confirmed=true) ok
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL is started.
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: Changing p_pgsql-status on node2.cluster : STOP->HS:alone. Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (HS:alone) Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent update 22: p_pgsql-status=HS:alone Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished: p_pgsql_start_0:15439:stderr [ psql: could not connect to server: No such file or directory ] Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished: p_pgsql_start_0:15439:stderr [ #011Is the server running locally and accepting ] Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished: p_pgsql_start_0:15439:stderr [ #011connections on Unix domain socket "/tmp/.s.PGSQL.5432"? ] Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_pgsql_start_0 (call=70, rc=0, cib-update=29, confirmed=true) ok Oct 25 13:55:45 node2 pacemakerd[16158]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_pgsql_notify_0 (call=100, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #174 lo, 10.255.0.33#123 Enabled Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #175 lo, 10.255.0.32#123 Enabled Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_cs_dispatch: Update relayed from node1.cluster Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (HS:async) Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent update 24: p_pgsql-status=HS:async Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update relayed from node1.cluster Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_pgsql (100) Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent update 26: master-p_pgsql=100 Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update relayed from node1.cluster Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_pgsql-status (HS:sync) Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent update 28: p_pgsql-status=HS:sync Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Changing MySQL configuration to replicate from node1.cluster. Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: ERROR: check_slave invoked on an instance that is not a replication slave. Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Restored master pos for 192.168.253.5 : mysqld-bin.000471:106 Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (1) Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent update 30: master-p_mysql=1 Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (60) Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent update 32: master-p_mysql=60 Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (1) Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_perform_update: Sent update 34: readable=1
Oct 25 13:55:49 node2 mysql(p_mysql)[15438]: INFO: MySQL started
Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished: p_mysql_start_0:15438:stderr [ Error performing operation: No such device or address ] Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished: p_mysql_start_0:15438:stderr [ resource ms_MySQL is NOT running ] Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_mysql_start_0 (call=68, rc=0, cib-update=30, confirmed=true) ok Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_mysql_notify_0 (call=104, rc=0, cib-update=0, confirmed=true) ok Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_perl-fpm_monitor_10000 (call=111, rc=0, cib-update=31, confirmed=false) ok Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_radiusd_monitor_10000 (call=113, rc=0, cib-update=32, confirmed=false) ok Oct 25 13:55:49 node2 pacemakerd[16380]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_pgsql_monitor_7000 (call=109, rc=0, cib-update=33, confirmed=false) ok Oct 25 13:55:49 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Adding inet address 192.168.253.63/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:55:49 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Adding inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to device br0 Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record for 192.168.253.63 on br0.IPv4. Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record for 192.168.253.31 on br0.IPv4. Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Bringing device br0 up Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Bringing device br0 up Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto not_used not_used Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto not_used not_used Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM operation mysql_reader_vip_start_0 (call=117, rc=0, cib-update=34, confirmed=true) ok Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM operation pgsql_reader_vip_start_0 (call=119, rc=0, cib-update=35, confirmed=true) ok Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM operation mysql_reader_vip_monitor_10000 (call=124, rc=0, cib-update=36, confirmed=false) ok Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (52) Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_perform_update: Sent update 36: master-p_mysql=52 Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM operation pgsql_reader_vip_monitor_10000 (call=126, rc=0, cib-update=37, confirmed=false) ok Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM operation p_mysql_monitor_2000 (call=107, rc=0, cib-update=38, confirmed=false) ok Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #176 br0, 192.168.253.63#123 Enabled Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #177 br0, 192.168.253.31#123 Enabled Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (50) Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_perform_update: Sent update 38: master-p_mysql=50 Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (51) Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_perform_update: Sent update 40: master-p_mysql=51 Oct 25 13:55:56 node2 mysql(p_mysql)[17074]: ERROR: MySQL instance configured for replication, but replication has failed. Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (0) Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent update 42: readable=0 Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql (0) Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent update 44: master-p_mysql=0 Oct 25 13:55:56 node2 IPaddr2(mysql_reader_vip)[17141]: INFO: IP status = ok, IP_CIP= Oct 25 13:55:56 node2 avahi-daemon[6932]: Withdrawing address record for 192.168.253.63 on br0. Oct 25 13:55:56 node2 crmd[14927]: notice: process_lrm_event: LRM operation mysql_reader_vip_stop_0 (call=132, rc=0, cib-update=39, confirmed=true) ok Oct 25 13:55:57 node2 pacemakerd[17206]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 25 13:55:58 node2 ntpd[7193]: Deleting interface #176 br0, 192.168.253.63#123, interface stats: received=0, sent=0, dropped=0, active_time=7 secs Oct 25 13:55:58 node2 mysql(p_mysql)[17267]: ERROR: MySQL instance configured for replication, but replication has failed.


Prevent:

Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.


2) How to resolve split brain state? Is it enough just to wait for
failure, then - restart mysql by hand and clean row with dup index in
slave db, and then run resource again? Or there is some automation for
such cases?

How are you sharing data? Can you give us a better understanding of your setup?

Semi-synchronous MySQL replication, if you mean sharing DB log between nodes.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to