Another cluster another set of logs This is more interesting all the clusters are setup the same.
On this cluster none of the nodes failed But I still got Oct 30 01:20:40 uatrp2 crmd[2595]: error: process_lrm_event: LRM operation ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms) Oct 30 01:22:20 uatrp1 crmd[2523]: error: process_lrm_event: LRM operation ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms) This has me really confused now! Oct 30 01:19:30 uatrp1 crmd[2523]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Oct 30 01:19:30 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:19:30 uatrp1 pengine[2522]: notice: process_pe_message: Calculated Transition 691: /var/lib/pacemaker/pengine/pe-input-816.bz2 Oct 30 01:19:30 uatrp1 crmd[2523]: notice: run_graph: Transition 691 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-816.bz2): Complete Oct 30 01:19:30 uatrp1 crmd[2523]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 30 01:20:40 uatrp2 lrmd[2592]: warning: child_timeout_callback: ybrpstat_monitor_5000 process (PID 15638) timed out Oct 30 01:20:40 uatrp2 lrmd[2592]: warning: operation_finished: ybrpstat_monitor_5000:15638 - timed out after 20000ms Oct 30 01:20:40 uatrp2 crmd[2595]: error: process_lrm_event: LRM operation ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms) Oct 30 01:20:40 uatrp1 crmd[2523]: warning: update_failcount: Updating failcount for ybrpstat on uatrp2 after failed monitor: rc=1 (update=value++, time=1414592440) Oct 30 01:20:40 uatrp1 crmd[2523]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 30 01:20:40 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:20:41 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:1 on uatrp2: unknown error (1) Oct 30 01:20:41 uatrp1 pengine[2522]: notice: LogActions: Recover ybrpstat:1#011(Started uatrp2) Oct 30 01:20:41 uatrp1 pengine[2522]: notice: process_pe_message: Calculated Transition 692: /var/lib/pacemaker/pengine/pe-input-817.bz2 Oct 30 01:20:41 uatrp1 crmd[2523]: notice: te_rsc_command: Initiating action 4: stop ybrpstat_stop_0 on uatrp2 Oct 30 01:20:41 uatrp2 proxy(ybrpstat)[15696]: INFO: proxy is not stopped by pacemaker return success Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat (1) Oct 30 01:20:41 uatrp2 crmd[2595]: notice: process_lrm_event: LRM operation ybrpstat_stop_0 (call=22, rc=0, cib-update=15, confirmed=true) ok Oct 30 01:20:41 uatrp1 crmd[2523]: notice: te_rsc_command: Initiating action 13: start ybrpstat_start_0 on uatrp2 Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 9: fail-count-ybrpstat=1 Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1414592440) Oct 30 01:20:41 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 12: last-failure-ybrpstat=1414592440 Oct 30 01:21:01 uatrp2 lrmd[2592]: warning: child_timeout_callback: ybrpstat_start_0 process (PID 15705) timed out Oct 30 01:21:01 uatrp2 lrmd[2592]: warning: operation_finished: ybrpstat_start_0:15705 - timed out after 20000ms Oct 30 01:21:01 uatrp2 crmd[2595]: error: process_lrm_event: LRM operation ybrpstat_start_0 (26) Timed Out (timeout=20000ms) Oct 30 01:21:01 uatrp1 crmd[2523]: warning: status_from_rc: Action 13 (ybrpstat_start_0) on uatrp2 failed (target: 0 vs. rc: 1): Error Oct 30 01:21:01 uatrp1 crmd[2523]: warning: update_failcount: Updating failcount for ybrpstat on uatrp2 after failed start: rc=1 (update=INFINITY, time=1414592461) Oct 30 01:21:01 uatrp1 crmd[2523]: warning: update_failcount: Updating failcount for ybrpstat on uatrp2 after failed start: rc=1 (update=INFINITY, time=1414592461) Oct 30 01:21:01 uatrp1 crmd[2523]: notice: run_graph: Transition 692 (Complete=7, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-817.bz2): Stopped Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat (INFINITY) Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 14: fail-count-ybrpstat=INFINITY Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1414592461) Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 16: last-failure-ybrpstat=1414592461 Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat (INFINITY) Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 18: fail-count-ybrpstat=INFINITY Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_cs_dispatch: Update relayed from uatrp1 Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1414592461) Oct 30 01:21:01 uatrp2 attrd[2593]: notice: attrd_perform_update: Sent update 20: last-failure-ybrpstat=1414592461 Oct 30 01:21:01 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:21:01 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on uatrp2: unknown error (1) Oct 30 01:21:01 uatrp1 pengine[2522]: notice: LogActions: Recover ybrpstat:1#011(Started uatrp2) Oct 30 01:21:01 uatrp1 pengine[2522]: notice: process_pe_message: Calculated Transition 693: /var/lib/pacemaker/pengine/pe-input-818.bz2 Oct 30 01:21:01 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:21:01 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on uatrp2: unknown error (1) Oct 30 01:21:01 uatrp1 pengine[2522]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000) Oct 30 01:21:01 uatrp1 pengine[2522]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000) Oct 30 01:21:01 uatrp1 pengine[2522]: notice: LogActions: Stop ybrpstat:1#011(uatrp2) Oct 30 01:21:01 uatrp1 pengine[2522]: notice: process_pe_message: Calculated Transition 694: /var/lib/pacemaker/pengine/pe-input-819.bz2 Oct 30 01:21:01 uatrp1 crmd[2523]: notice: te_rsc_command: Initiating action 3: stop ybrpstat_stop_0 on uatrp2 Oct 30 01:21:01 uatrp2 proxy(ybrpstat)[15755]: INFO: proxy is not stopped by pacemaker return success Oct 30 01:21:01 uatrp2 crmd[2595]: notice: process_lrm_event: LRM operation ybrpstat_stop_0 (call=29, rc=0, cib-update=17, confirmed=true) ok Oct 30 01:21:01 uatrp1 crmd[2523]: notice: run_graph: Transition 694 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-819.bz2): Complete Oct 30 01:21:01 uatrp1 crmd[2523]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 30 01:22:20 uatrp1 lrmd[2520]: warning: child_timeout_callback: ybrpstat_monitor_5000 process (PID 23676) timed out Oct 30 01:22:20 uatrp1 lrmd[2520]: warning: operation_finished: ybrpstat_monitor_5000:23676 - timed out after 20000ms Oct 30 01:22:20 uatrp1 crmd[2523]: error: process_lrm_event: LRM operation ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms) Oct 30 01:22:20 uatrp1 crmd[2523]: warning: update_failcount: Updating failcount for ybrpstat on uatrp1 after failed monitor: rc=1 (update=value++, time=1414592540) Oct 30 01:22:20 uatrp1 crmd[2523]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 30 01:22:20 uatrp1 attrd[2521]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat (1) Oct 30 01:22:20 uatrp1 attrd[2521]: notice: attrd_perform_update: Sent update 17: fail-count-ybrpstat=1 Oct 30 01:22:20 uatrp1 attrd[2521]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1414592540) Oct 30 01:22:20 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:22:20 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on uatrp1: unknown error (1) Oct 30 01:22:20 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on uatrp2: unknown error (1) Oct 30 01:22:20 uatrp1 pengine[2522]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000) Oct 30 01:22:20 uatrp1 pengine[2522]: warning: common_apply_stickiness: Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000) Oct 30 01:22:20 uatrp1 pengine[2522]: notice: LogActions: Restart ybrpip#011(Started uatrp1) Oct 30 01:22:20 uatrp1 pengine[2522]: notice: LogActions: Recover ybrpstat:0#011(Started uatrp1) Oct 30 01:22:20 uatrp1 pengine[2522]: notice: process_pe_message: Calculated Transition 695: /var/lib/pacemaker/pengine/pe-input-820.bz2 Oct 30 01:22:20 uatrp1 attrd[2521]: notice: attrd_perform_update: Sent update 20: last-failure-ybrpstat=1414592540 Oct 30 01:22:20 uatrp1 pengine[2522]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 30 01:22:20 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op monitor for ybrpstat:0 on uatrp1: unknown error (1) Oct 30 01:22:20 uatrp1 pengine[2522]: warning: unpack_rsc_op: Processing failed op start for ybrpstat:1 on uatrp2: unknown error (1) > -----Original Message----- > From: Alex Samad - Yieldbroker > Sent: Thursday, 30 October 2014 12:06 PM > To: pacemaker@oss.clusterlabs.org > Subject: RE: Help understanding what went wrong > > Hi > > Seems like I had the same problem on my prod cluster > > Oct 30 01:21:53 prodrp1 lrmd[2536]: warning: child_timeout_callback: > ybrpstat_monitor_5000 process (PID 17918) timed out Oct 30 01:21:53 > prodrp1 lrmd[2536]: warning: operation_finished: > ybrpstat_monitor_5000:17918 - timed out after 20000ms > Oct 30 01:21:53 prodrp1 crmd[2539]: error: process_lrm_event: LRM > operation ybrpstat_monitor_5000 (4384) Timed Out (timeout=20000ms) > Oct 30 01:21:53 prodrp1 attrd[2537]: notice: attrd_cs_dispatch: Update > relayed from prodrp2 > > This is the first recording on prodrp1. > > The issue is that prodrp1 was the master for the IP. > > I am wondering if it has something to do with it being a cloned resource ? > > So the cluster realises there is a problem, in the middle of it realising > that the > cluster check monitor fails because (assumption here) it does some sort of > cluster wide check! > > A > > > -----Original Message----- > > From: Alex Samad - Yieldbroker [mailto:alex.sa...@yieldbroker.com] > > Sent: Thursday, 30 October 2014 11:43 AM > > To: pacemaker@oss.clusterlabs.org > > Subject: [Pacemaker] Help understanding what went wrong > > > > Hi > > > > > > rpm -qa | grep pace > > pacemaker-libs-1.1.10-14.el6_5.3.x86_64 > > pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64 > > pacemaker-cli-1.1.10-14.el6_5.3.x86_64 > > pacemaker-1.1.10-14.el6_5.3.x86_64 > > > > centos 6.5 > > > > > > I have a 2 node cluster > > pcs config > > Cluster Name: ybrp > > Corosync Nodes: > > > > Pacemaker Nodes: > > wwwrp1 wwwrp2 > > > > Resources: > > Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2) > > Attributes: ip=10.32.43.50 cidr_netmask=24 nic=eth0 > > clusterip_hash=sourceip-sourceport > > Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s > > Operations: start on-fail=restart interval=0s timeout=60s > > (ybrpip-start- > > interval-0s) > > monitor on-fail=restart interval=5s timeout=20s > > (ybrpip-monitor- > > interval-5s) > > stop interval=0s timeout=60s (ybrpip-stop-interval-0s) > > Clone: ybrpstat-clone > > Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1 > > Resource: ybrpstat (class=ocf provider=yb type=proxy) > > Operations: monitor on-fail=restart interval=5s timeout=20s > > (ybrpstat- > > monitor-interval-5s) > > > > Stonith Devices: > > Fencing Levels: > > > > Location Constraints: > > Ordering Constraints: > > start ybrpstat-clone then start ybrpip (Mandatory) > > (id:order-ybrpstat-clone- > > ybrpip-mandatory) > > Colocation Constraints: > > ybrpip with ybrpstat-clone (INFINITY) > > (id:colocation-ybrpip-ybrpstat-clone- > > INFINITY) > > > > Cluster Properties: > > cluster-infrastructure: cman > > dc-version: 1.1.10-14.el6_5.3-368c726 > > last-lrm-refresh: 1414629002 > > no-quorum-policy: ignore > > stonith-enabled: false > > > > > > > > Basically 1 node died (wwwrp1) hardware failure > > > > I can see in the logs the cluster wants to bring the IP address over > > and it seems to do it but > > > > > > > > ## this seems to be the ip address moving Oct 30 01:19:43 wwwrp2 > > IPaddr2(ybrpip)[25778]: INFO: Adding inet address 10.32.43.50/24 with > > broadcast address 10.32.43.255 to device eth0 Oct 30 01:19:43 wwwrp2 > > IPaddr2(ybrpip)[25778]: INFO: Bringing device eth0 up Oct 30 01:19:43 > > wwwrp2 IPaddr2(ybrpip)[25778]: INFO: /usr/libexec/heartbeat/send_arp > > -i > > 200 -r 5 -p /var/run/resource-agents/send_arp-10.32.43.50 eth0 > > 10.32.43.50 auto not_used not_used > > > > > > ## this seems to be where it checks other stuff > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: process_lrm_event: LRM > > operation ybrpip_start_0 (call=13476, rc=0, cib-update=11762, > > confirmed=true) ok > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: te_rsc_command: Initiating > > action 8: monitor ybrpip_monitor_5000 on wwwrp2 (local) > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: process_lrm_event: LRM > > operation ybrpip_monitor_5000 (call=13479, rc=0, cib-update=11763, > > confirmed=false) ok > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: run_graph: Transition 6828 > > (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pacemaker/pengine/pe-input-3.bz2): Complete > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: do_state_transition: State > > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > > > # this is where monitor times out, but it doesn't look like 20000ms .. > > the initial try was at Oct 30 01:19:43 Oct 30 01:19:44 wwwrp2 lrmd[2459]: > warning: > > child_timeout_callback: ybrpstat_monitor_5000 process (PID 25712) > > timed out Oct 30 01:19:44 wwwrp2 lrmd[2459]: warning: > operation_finished: > > ybrpstat_monitor_5000:25712 - timed out after 20000ms > > Oct 30 01:19:44 wwwrp2 crmd[2462]: error: process_lrm_event: LRM > > operation ybrpstat_monitor_5000 (13473) Timed Out (timeout=20000ms) > > Oct 30 01:19:44 wwwrp2 crmd[2462]: warning: update_failcount: > > Updating failcount for ybrpstat on wwwrp2 after failed monitor: rc=1 > > (update=value++, time=1414592384) > > > > I'm guessing because it timed out it went into failed mode. I need to > > know why it timed out. The script has never timed out before or in > testing... > > > > Am I reading this right. The reason the resource didn't fail over (ip > > address) was because there was no ybrpstat running on wwwrp2 and the > > reason for that was the monitor action failed/timed out > > > > Thanks > > Alex > > > > > > === logs == > > > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: unpack_config: On loss of > > CCM Quorum: Ignore > > Oct 30 01:19:42 wwwrp2 pengine[2461]: warning: unpack_rsc_op: > > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: LogActions: Start > > ybrpip#011(wwwrp1) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: LogActions: Recover > > ybrpstat:0#011(Started wwwrp1) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: process_pe_message: > > Calculated Transition 6827: /var/lib/pacemaker/pengine/pe-input-2.bz2 > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: unpack_config: On loss of > > CCM Quorum: Ignore > > Oct 30 01:19:42 wwwrp2 pengine[2461]: warning: unpack_rsc_op: > > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1) > > Oct 30 01:19:42 > > wwwrp2 pengine[2461]: warning: common_apply_stickiness: Forcing > > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000) > > Oct > > 30 01:19:42 wwwrp2 pengine[2461]: warning: common_apply_stickiness: > > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures > > (max=1000000) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: LogActions: Start > > ybrpip#011(wwwrp2) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: LogActions: Stop > > ybrpstat:0#011(wwwrp1) > > Oct 30 01:19:42 wwwrp2 pengine[2461]: notice: process_pe_message: > > Calculated Transition 6828: /var/lib/pacemaker/pengine/pe-input-3.bz2 > > Oct 30 01:19:42 wwwrp2 crmd[2462]: notice: te_rsc_command: Initiating > > action 7: start ybrpip_start_0 on wwwrp2 (local) > > Oct 30 01:19:42 wwwrp2 crmd[2462]: notice: te_rsc_command: Initiating > > action 1: stop ybrpstat_stop_0 on wwwrp1 Oct 30 01:19:43 wwwrp2 > > IPaddr2(ybrpip)[25778]: INFO: Adding inet address > > 10.32.43.50/24 with broadcast address 10.32.43.255 to device eth0 Oct > > 30 > > 01:19:43 wwwrp2 IPaddr2(ybrpip)[25778]: INFO: Bringing device eth0 up > > Oct > > 30 01:19:43 wwwrp2 IPaddr2(ybrpip)[25778]: INFO: > > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource- > > agents/send_arp-10.32.43.50 eth0 10.32.43.50 auto not_used not_used > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: process_lrm_event: LRM > > operation ybrpip_start_0 (call=13476, rc=0, cib-update=11762, > > confirmed=true) ok > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: te_rsc_command: Initiating > > action 8: monitor ybrpip_monitor_5000 on wwwrp2 (local) > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: process_lrm_event: LRM > > operation ybrpip_monitor_5000 (call=13479, rc=0, cib-update=11763, > > confirmed=false) ok > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: run_graph: Transition 6828 > > (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pacemaker/pengine/pe-input-3.bz2): Complete > > Oct 30 01:19:43 wwwrp2 crmd[2462]: notice: do_state_transition: State > > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 30 01:19:44 wwwrp2 > > lrmd[2459]: warning: child_timeout_callback: > > ybrpstat_monitor_5000 process (PID 25712) timed out Oct 30 01:19:44 > > wwwrp2 lrmd[2459]: warning: operation_finished: > > ybrpstat_monitor_5000:25712 - timed out after 20000ms > > Oct 30 01:19:44 wwwrp2 crmd[2462]: error: process_lrm_event: LRM > > operation ybrpstat_monitor_5000 (13473) Timed Out (timeout=20000ms) > > Oct 30 01:19:44 wwwrp2 crmd[2462]: warning: update_failcount: > > Updating failcount for ybrpstat on wwwrp2 after failed monitor: rc=1 > > (update=value++, time=1414592384) > > Oct 30 01:19:44 wwwrp2 crmd[2462]: notice: do_state_transition: State > > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > > Oct 30 01:19:44 wwwrp2 attrd[2460]: notice: attrd_trigger_update: > Sending > > flush op to all hosts for: fail-count-ybrpstat (1) > > Oct 30 01:19:44 wwwrp2 attrd[2460]: notice: attrd_perform_update: Sent > > update 12543: fail-count-ybrpstat=1 > > Oct 30 01:19:44 wwwrp2 attrd[2460]: notice: attrd_trigger_update: > Sending > > flush op to all hosts for: last-failure-ybrpstat (1414592384) > > Oct 30 01:19:44 wwwrp2 attrd[2460]: notice: attrd_perform_update: Sent > > update 12545: last-failure-ybrpstat=1414592384 > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: unpack_config: On loss of > > CCM Quorum: Ignore > > Oct 30 01:19:44 wwwrp2 pengine[2461]: warning: unpack_rsc_op: > > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1) > > Oct 30 01:19:44 > > wwwrp2 pengine[2461]: warning: unpack_rsc_op: Processing failed op > > monitor for ybrpstat:0 on wwwrp2: unknown error (1) Oct 30 01:19:44 > > wwwrp2 pengine[2461]: warning: common_apply_stickiness: Forcing > > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000) > > Oct > > 30 01:19:44 wwwrp2 pengine[2461]: warning: common_apply_stickiness: > > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures > > (max=1000000) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: LogActions: Restart > > ybrpip#011(Started wwwrp2) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: LogActions: Recover > > ybrpstat:0#011(Started wwwrp2) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: process_pe_message: > > Calculated Transition 6829: /var/lib/pacemaker/pengine/pe-input-4.bz2 > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: unpack_config: On loss of > > CCM Quorum: Ignore > > Oct 30 01:19:44 wwwrp2 pengine[2461]: warning: unpack_rsc_op: > > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1) > > Oct 30 01:19:44 > > wwwrp2 pengine[2461]: warning: unpack_rsc_op: Processing failed op > > monitor for ybrpstat:0 on wwwrp2: unknown error (1) Oct 30 01:19:44 > > wwwrp2 pengine[2461]: warning: common_apply_stickiness: Forcing > > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000) > > Oct > > 30 01:19:44 wwwrp2 pengine[2461]: warning: common_apply_stickiness: > > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures > > (max=1000000) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: LogActions: Restart > > ybrpip#011(Started wwwrp2) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: LogActions: Recover > > ybrpstat:0#011(Started wwwrp2) > > Oct 30 01:19:44 wwwrp2 pengine[2461]: notice: process_pe_message: > > Calculated Transition 6830: /var/lib/pacemaker/pengine/pe-input-5.bz2 > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org