25.10.2014 22:34, Digimer пишет:
On 25/10/14 03:32 PM, Andrew wrote:
Hi all.
I use Percona as RA on cluster (nothing mission-critical, currently -
just zabbix data); today after restarting MySQL resource (crm resource
restart p_mysql) I've got a split brain state - MySQL for some reason
started first at ex-slave node, ex-master starts later (possibly I've
set too small timeout to shutdown - only 120s, but I'm not sure).
After restart resource on both nodes it seems like mysql replication was
ok - but then after ~50min it fails in split brain again for unknown
reason (no resource restart was noticed).
In 'show replication status' there is an error in table caused by unique
index dup.
So I have a questions:
1) Which thing causes split brain, and how to avoid it in future?
Cause:
Logs?
ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:13 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3459: /var/lib/pacemaker/pengine/pe-input-254.bz2
Oct 25 13:54:13 node2 crmd[29248]: notice: run_graph: Transition 3459
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-254.bz2): Complete
Oct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:14 node2 pacemakerd[11626]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:16 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3460: /var/lib/pacemaker/pengine/pe-input-255.bz2
Oct 25 13:54:16 node2 crmd[29248]: notice: run_graph: Transition 3460
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-255.bz2): Complete
Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:18 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3461: /var/lib/pacemaker/pengine/pe-input-256.bz2
Oct 25 13:54:18 node2 crmd[29248]: notice: run_graph: Transition 3461
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-256.bz2): Complete
Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:19 node2 pacemaker: Signaling Pacemaker Cluster Manager to
terminate
Oct 25 13:54:19 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:
Shuting down Pacemaker
Oct 25 13:54:19 node2 pacemakerd[29237]: notice: stop_child: Stopping
crmd: Sent -15 to process 29248
Oct 25 13:54:19 node2 crmd[29248]: notice: crm_shutdown: Requesting
shutdown, upper limit is 1200000ms
Oct 25 13:54:19 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN
origin=crm_shutdown ]
Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: shutdown (1414234459)
Oct 25 13:54:19 node2 pacemaker: Waiting for cluster services to unload
Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7041: shutdown=1414234459
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:19 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
ClusterIP#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
pgsql_reader_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Demote
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_nginx:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_perl-fpm:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_radiusd:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3462: /var/lib/pacemaker/pengine/pe-input-257.bz2
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 24: stop ClusterIP_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: stop pgsql_reader_vip_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 2: cancel p_mysql_cancel_2000 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 143: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 145: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 149: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 151: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 105: stop p_perl-fpm_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 120: stop p_radiusd_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 lrmd[29245]: notice: operation_finished:
p_perl-fpm_stop_0:12107:stderr [ grep: /var/run/perl-fpm/perl-fpm.pid:
No such file or directory ]
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_perl-fpm_stop_0 (call=2537, rc=0, cib-update=4282,
confirmed=true) ok
Oct 25 13:54:19 node2 pacemakerd[12173]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node2 mysql(p_mysql)[12101]: INFO: post-demote
notification for node2.cluster
Oct 25 13:54:19 node2 IPaddr2(pgsql_reader_vip)[12100]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:54:19 node2 IPaddr(ClusterIP)[12099]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.31 on br0.
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2534, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.254 on br0.
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_stop_0 (call=2528, rc=0, cib-update=4283,
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation ClusterIP_stop_0 (call=2525, rc=0, cib-update=4284,
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2532, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_radiusd_stop_0 (call=2540, rc=0, cib-update=4285,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: run_graph: Transition 3462
(Complete=15, Pending=0, Fired=0, Skipped=29, Incomplete=18,
Source=/var/lib/pacemaker/pengine/pe-input-257.bz2): Stopped
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:20 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start
ClusterIP#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start
pgsql_reader_vip#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Demote
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop
p_nginx:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3463: /var/lib/pacemaker/pengine/pe-input-258.bz2
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 19: start ClusterIP_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 26: start pgsql_reader_vip_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 134: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 136: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 140: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 142: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 97: stop p_nginx_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 pacemakerd[12402]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 mysql(p_mysql)[12371]: INFO: post-demote
notification for node2.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2552, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 64: stop p_pgsql_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2550, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 34: demote p_mysql_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 20: monitor ClusterIP_monitor_2000 on node1.cluster
Oct 25 13:54:20 node2 pacemakerd[12542]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 27: monitor pgsql_reader_vip_monitor_10000 on node1.cluster
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7043: master-p_mysql=1
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_demote_0 (call=2567, rc=0, cib-update=4287,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 135: notify p_mysql_post_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 137: notify p_mysql_post_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7045: master-p_pgsql=-INFINITY
Oct 25 13:54:20 node2 mysql(p_mysql)[12605]: INFO: Ignoring post-demote
notification for my own demotion.
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2571, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #110 br0:0,
192.168.253.254#123, interface stats: received=0, sent=0, dropped=0,
active_time=47343 secs
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #172 br0,
192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,
active_time=45246 secs
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: --- 0.320.2419
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.321.1
17dd8c556d2c592e066c52d67b1a5b87
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: -- <nvpair
value="STREAMING|SYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="DISCONNECT"/>
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: Killing nginx PID 29997
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: nginx stopped.
Oct 25 13:54:21 node2 lrmd[29245]: notice: operation_finished:
p_nginx_stop_0:12373:stderr [ /usr/lib/ocf/resource.d/heartbeat/nginx:
line 467: kill: (29997) - No such process ]
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_stop_0 (call=2556, rc=0, cib-update=4288,
confirmed=true) ok
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: waiting for server to
shut down.... done server stopped
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: PostgreSQL is down
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Changing
p_pgsql-status on node2.cluster : HS:sync->STOP.
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7047: p_pgsql-status=STOP
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Set all nodes into
async mode.
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_stop_0 (call=2562, rc=0, cib-update=4289,
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 141: notify p_pgsql_post_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]: notice: run_graph: Transition 3463
(Complete=28, Pending=0, Fired=0, Skipped=18, Incomplete=9,
Source=/var/lib/pacemaker/pengine/pe-input-258.bz2): Stopped
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:21 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop
p_mysql:1#011(node2.cluster)
Oct 25 13:54:21 node2 pengine[29247]: crit: stage8: Cannot shut down
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:54:21 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3464: /var/lib/pacemaker/pengine/pe-input-259.bz2
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 125: notify p_mysql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 127: notify p_mysql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2576, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: stop p_mysql_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (<null>)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
delete 7049: node=node2.cluster, attr=master-p_mysql, id=<n/a>,
set=(null), section=status
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7052: readable=0
Oct 25 13:54:25 node2 nginx(p_nginx)[12778]: INFO: nginx not running
Oct 25 13:54:25 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=2442, rc=7, cib-update=4291,
confirmed=false) not running
Oct 25 13:54:25 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234465)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (1)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7055: fail-count-p_nginx=1
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234465)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7057: last-failure-p_nginx=1414234465
Oct 25 13:54:26 node2 nginx(p_nginx)[12844]: INFO: nginx not running
Oct 25 13:54:26 node2 nginx(p_nginx)[12845]: INFO: nginx not running
Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=333, rc=7, cib-update=4292,
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=2469, rc=7, cib-update=4293,
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (2)
Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7059: fail-count-p_nginx=2
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7061: last-failure-p_nginx=1414234466
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (3)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7063: fail-count-p_nginx=3
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7065: last-failure-p_nginx=1414234466
Oct 25 13:54:35 node2 nginx(p_nginx)[13037]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13099]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13098]: INFO: nginx not running
Oct 25 13:54:45 node2 nginx(p_nginx)[13291]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13353]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13352]: INFO: nginx not running
Oct 25 13:54:55 node2 nginx(p_nginx)[13549]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13628]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13627]: INFO: nginx not running
Oct 25 13:55:05 node2 nginx(p_nginx)[13870]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13931]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13932]: INFO: nginx not running
Oct 25 13:55:15 node2 nginx(p_nginx)[14151]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14214]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14213]: INFO: nginx not running
Oct 25 13:55:25 node2 nginx(p_nginx)[14405]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14469]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14468]: INFO: nginx not running
Oct 25 13:55:35 node2 nginx(p_nginx)[14659]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14723]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14724]: INFO: nginx not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL stopped
Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_stop_0 (call=2579, rc=0, cib-update=4294,
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 126: notify p_mysql_post_notify_stop_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3464
(Complete=10, Pending=0, Fired=0, Skipped=6, Incomplete=4,
Source=/var/lib/pacemaker/pengine/pe-input-259.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Stop
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: crit: stage8: Cannot shut down
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3465: /var/lib/pacemaker/pengine/pe-input-260.bz2
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 126: notify p_mysql_pre_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 30: promote p_mysql_promote_0 on node1.cluster
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: --- 0.321.15
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.322.1
92b30c7922072633f9dbc336e4f1a066
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: -- <nvpair
value="192.168.253.4|mysqld-bin.000409|106"
id="mysql_replication-p_mysql_REPL_INFO"/>
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: ++ <nvpair
id="mysql_replication-p_mysql_REPL_INFO" name="p_mysql_REPL_INFO"
value="192.168.253.5|mysqld-bin.000471|106"/>
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 127: notify p_mysql_post_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3465
(Complete=9, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-260.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3466: /var/lib/pacemaker/pengine/pe-input-261.bz2
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 21: stop mysql_writer_vip_stop_0 on node2.cluster (local)
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: monitor p_mysql_monitor_5000 on node1.cluster
Oct 25 13:55:38 node2 IPaddr2(mysql_writer_vip)[14851]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:55:38 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.64 on br0.
Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM
operation mysql_writer_vip_stop_0 (call=2586, rc=0, cib-update=4300,
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 22: start mysql_writer_vip_start_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 23: monitor mysql_writer_vip_monitor_10000 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3466
(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-261.bz2): Complete
Oct 25 13:55:38 node2 crmd[29248]: notice: lrm_state_verify_stopped:
Stopped 1 recurring operations at shutdown... waiting (0 ops remaining)
Oct 25 13:55:38 node2 lrmd[29245]: warning: qb_ipcs_event_sendv:
new_event_notification (29245-29248-6): Bad file descriptor (9)
Oct 25 13:55:38 node2 lrmd[29245]: warning: send_client_notify:
Notification of client crmd/d068e1e6-7c6e-48b0-bb86-49d35fe11673 failed
Oct 25 13:55:38 node2 crmd[29248]: notice: do_lrm_control:
Disconnected from the LRM
Oct 25 13:55:38 node2 crmd[29248]: notice: terminate_cs_connection:
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client crmd (conn=0x1afff00, async-conn=0x1afff00) left
Oct 25 13:55:38 node2 cib[29243]: warning: qb_ipcs_event_sendv:
new_event_notification (29243-29248-13): Broken pipe (32)
Oct 25 13:55:38 node2 cib[29243]: warning: do_local_notify: A-Sync
reply to crmd failed: No message of desired type
Oct 25 13:55:38 node2 pacemakerd[29237]: warning: qb_ipcs_event_sendv:
new_event_notification (29237-29248-14): Broken pipe (32)
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
pengine: Sent -15 to process 29247
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.crmd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
attrd: Sent -15 to process 29246
Oct 25 13:55:38 node2 attrd[29246]: notice: main: Exiting...
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client attrd (conn=0x1af3480, async-conn=0x1af3480) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
lrmd: Sent -15 to process 29245
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
stonith-ng: Sent -15 to process 29244
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client stonith-ng (conn=0x1af7800, async-conn=0x1af7800) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
cib: Sent -15 to process 29243
Oct 25 13:55:38 node2 cib[29243]: notice: terminate_cs_connection:
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client cib (conn=0x1afbb80, async-conn=0x1afbb80) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:
Shutdown complete
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 pacemaker: Starting Pacemaker Cluster Manager
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: main: Starting
Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios corosync-plugin cman
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 cib[14922]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 attrd[14925]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 cib[14922]: notice: main: Using legacy config
location: /var/lib/heartbeat/crm
Oct 25 13:55:39 node2 attrd[14925]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 pengine[14926]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 lrmd[14924]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]: notice: main: CRM Git Version: 368c726
Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1af31c0 for attrd/0
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1af7540 for stonith-ng/0
Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 attrd[14925]: notice: main: Starting mainloop...
Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1afb8c0 for cib/0
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Sending membership update 1608 to cib
Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]: notice: plugin_handle_membership:
Membership 1608: quorum acquired
Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:
plugin_handle_membership: Node node2.cluster[83732672] - state is now
member (was (null))
Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:
plugin_handle_membership: Node node1.cluster[100509888] - state is now
member (was (null))
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.crmd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:40 node2 cib[14922]: notice: crm_ipc_prepare: Message
exceeds the configured ipc limit (51200 bytes), consider configuring
PCMK_ipc_buffer to 161796 or higher to avoid compression overheads
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1affc40 for crmd/0
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Sending membership update 1608 to crmd
Oct 25 13:55:40 node2 stonith-ng[14923]: notice: setup_cib: Watching
for stonith topology changes
Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 stonith-ng[14923]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Oct 25 13:55:40 node2 crmd[14927]: notice: plugin_handle_membership:
Membership 1608: quorum acquired
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:
plugin_handle_membership: Node node2.cluster[83732672] - state is now
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:
plugin_handle_membership: Node node1.cluster[100509888] - state is now
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]: notice: do_started: The local CRM
is operational
Oct 25 13:55:40 node2 crmd[14927]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Oct 25 13:55:40 node2 ntpd[7193]: Deleting interface #76 br0,
192.168.253.64#123, interface stats: received=0, sent=0, dropped=0,
active_time=3555898 secs
Oct 25 13:55:42 node2 crmd[14927]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Oct 25 13:55:42 node2 attrd[14925]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_monitor_0 (call=41, rc=7, cib-update=8,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_monitor_0 (call=54, rc=7, cib-update=9,
confirmed=true) not running
Oct 25 13:55:43 node2 pacemakerd[15105]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 mysql(p_mysql)[14949]: INFO: MySQL is not running
Oct 25 13:55:43 node2 crmd[14927]: error: generic_get_metadata:
Failed to retrieve meta-data for ocf:percona:mysql
Oct 25 13:55:43 node2 crmd[14927]: warning: get_rsc_metadata: No
metadata found for mysql::ocf:percona: Input/output error (-5)
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_monitor_0 (call=26, rc=7, cib-update=10,
confirmed=true) not running
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: Don't check
/var/lib/pgsql/9.1/data/ during probe
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: PostgreSQL is down
Oct 25 13:55:43 node2 nginx(p_nginx)[14956]: INFO: nginx not running
Oct 25 13:55:43 node2 pacemakerd[15368]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_monitor_0 (call=31, rc=7, cib-update=11,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_0 (call=36, rc=7, cib-update=12,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_monitor_0 (call=45, rc=7, cib-update=13,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_writer_vip_monitor_0 (call=21, rc=7, cib-update=14,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation ClusterIP_monitor_0 (call=5, rc=7, cib-update=15,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_monitor_0 (call=49, rc=7, cib-update=16,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_writer_vip_monitor_0 (call=13, rc=7, cib-update=17,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_0 (call=17, rc=7, cib-update=18,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_0 (call=9, rc=7, cib-update=19,
confirmed=true) not running
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:43 node2 pacemakerd[15481]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:43 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: nginx: the
configuration file /etc/nginx/nginx.conf syntax is ok nginx:
configuration file /etc/nginx/nginx.conf test is successful
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: Starting
/usr/sbin/nginx - nginx version: nginx/1.0.15
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: /usr/sbin/nginx build
configuration: configure arguments: --prefix=/usr/share/nginx
--sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log
--http-client-body-temp-path=/var/lib/nginx/tmp/client_body
--http-proxy-temp-path=/var/lib/nginx/tmp/proxy
--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
--http-scgi-temp-path=/var/lib/nginx/tmp/scgi
--pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx
--user=nginx --group=nginx --with-file-aio --with-ipv6
--with-http_ssl_module --with-http_realip_module
--with-http_addition_module --with-http_xslt_module
--with-http_image_filter_module --with-http_geoip_module
--with-http_sub_module --with-http_dav_module --with-http_flv_module
--with-http_mp4_module --with-http_gzip_static_module
--with-http_random_index_module --with-http_secure_link_module
--with-http_degradation_module --with-http_stub_status_module
--with-http_perl_module --with-mail --with-mail_ssl_module
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'
--with-ld-opt=-Wl,-E
Oct 25 13:55:43 node2 pgsql(p_pgsql)[15439]: INFO: Changing
p_pgsql-status on node2.cluster : ->STOP.
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_start_0 (call=72, rc=0, cib-update=20, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Adding inet
address 10.255.0.33/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Adding inet
address 10.255.0.32/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Bringing device
lo up
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Bringing device lo up
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_start_0 (call=76, rc=0, cib-update=21,
confirmed=true) ok
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_start_0 (call=78, rc=0, cib-update=22, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_30000 (call=81, rc=0, cib-update=23,
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: Set all nodes into
async mode.
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_monitor_10000 (call=89, rc=0, cib-update=24,
confirmed=false) ok
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_monitor_10000 (call=91, rc=0, cib-update=25,
confirmed=false) ok
Oct 25 13:55:44 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=83, rc=0, cib-update=26,
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: server starting
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL start
command sent.
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Can't get
PostgreSQL recovery status. rc=2
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Connection error
(connection to the server went bad and the session was not interactive)
occurred while executing the psql command.
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 4: master-p_mysql=0
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 7: master-p_pgsql=-INFINITY
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 11: readable=0
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 17: p_pgsql-status=STOP
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 20: probe_complete=true
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_start_0 (call=85, rc=0, cib-update=27,
confirmed=true) ok
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_start_0 (call=74, rc=0, cib-update=28,
confirmed=true) ok
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL is started.
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: Changing
p_pgsql-status on node2.cluster : STOP->HS:alone.
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:alone)
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 22: p_pgsql-status=HS:alone
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ psql: could not connect to server: No
such file or directory ]
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ #011Is the server running locally and
accepting ]
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ #011connections on Unix domain socket
"/tmp/.s.PGSQL.5432"? ]
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_start_0 (call=70, rc=0, cib-update=29, confirmed=true) ok
Oct 25 13:55:45 node2 pacemakerd[16158]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=100, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #174 lo,
10.255.0.33#123 Enabled
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #175 lo,
10.255.0.32#123 Enabled
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:async)
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 24: p_pgsql-status=HS:async
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (100)
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 26: master-p_pgsql=100
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:sync)
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 28: p_pgsql-status=HS:sync
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Changing MySQL
configuration to replicate from node1.cluster.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: ERROR: check_slave invoked
on an instance that is not a replication slave.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Restored master pos
for 192.168.253.5 : mysqld-bin.000471:106
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 30: master-p_mysql=1
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (60)
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 32: master-p_mysql=60
Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 34: readable=1
Oct 25 13:55:49 node2 mysql(p_mysql)[15438]: INFO: MySQL started
Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:
p_mysql_start_0:15438:stderr [ Error performing operation: No such
device or address ]
Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:
p_mysql_start_0:15438:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_start_0 (call=68, rc=0, cib-update=30, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=104, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_monitor_10000 (call=111, rc=0, cib-update=31,
confirmed=false) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_monitor_10000 (call=113, rc=0, cib-update=32,
confirmed=false) ok
Oct 25 13:55:49 node2 pacemakerd[16380]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_monitor_7000 (call=109, rc=0, cib-update=33,
confirmed=false) ok
Oct 25 13:55:49 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Adding
inet address 192.168.253.63/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:49 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Adding
inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record
for 192.168.253.63 on br0.IPv4.
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record
for 192.168.253.31 on br0.IPv4.
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Bringing
device br0 up
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Bringing
device br0 up
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto
not_used not_used
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto
not_used not_used
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_start_0 (call=117, rc=0, cib-update=34,
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_start_0 (call=119, rc=0, cib-update=35,
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_10000 (call=124, rc=0, cib-update=36,
confirmed=false) ok
Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (52)
Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 36: master-p_mysql=52
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_10000 (call=126, rc=0, cib-update=37,
confirmed=false) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_monitor_2000 (call=107, rc=0, cib-update=38,
confirmed=false) ok
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #176 br0,
192.168.253.63#123 Enabled
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #177 br0,
192.168.253.31#123 Enabled
Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (50)
Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 38: master-p_mysql=50
Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (51)
Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 40: master-p_mysql=51
Oct 25 13:55:56 node2 mysql(p_mysql)[17074]: ERROR: MySQL instance
configured for replication, but replication has failed.
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 42: readable=0
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 44: master-p_mysql=0
Oct 25 13:55:56 node2 IPaddr2(mysql_reader_vip)[17141]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:55:56 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.63 on br0.
Oct 25 13:55:56 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_stop_0 (call=132, rc=0, cib-update=39,
confirmed=true) ok
Oct 25 13:55:57 node2 pacemakerd[17206]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:58 node2 ntpd[7193]: Deleting interface #176 br0,
192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,
active_time=7 secs
Oct 25 13:55:58 node2 mysql(p_mysql)[17267]: ERROR: MySQL instance
configured for replication, but replication has failed.
Prevent:
Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.
2) How to resolve split brain state? Is it enough just to wait for
failure, then - restart mysql by hand and clean row with dup index in
slave db, and then run resource again? Or there is some automation for
such cases?
How are you sharing data? Can you give us a better understanding of
your setup?
Semi-synchronous MySQL replication, if you mean sharing DB log between
nodes.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org