Re: [Pacemaker] Help understanding what went wrong

Alex Samad - Yieldbroker Wed, 29 Oct 2014 19:07:40 -0700

Another cluster another set of logs

This is more interesting all the clusters are setup the same.


On this cluster none of  the nodes failed

But I still got 
Oct 30 01:20:40 uatrp2 crmd[2595]:    error: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms)
Oct 30 01:22:20 uatrp1 crmd[2523]:    error: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms)

This has me really confused now!


Oct 30 01:19:30 uatrp1 crmd[2523]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Oct 30 01:19:30 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:19:30 uatrp1 pengine[2522]:   notice: process_pe_message: Calculated 
Transition 691: /var/lib/pacemaker/pengine/pe-input-816.bz2
Oct 30 01:19:30 uatrp1 crmd[2523]:   notice: run_graph: Transition 691 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-816.bz2): Complete
Oct 30 01:19:30 uatrp1 crmd[2523]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 30 01:20:40 uatrp2 lrmd[2592]:  warning: child_timeout_callback: 
ybrpstat_monitor_5000 process (PID 15638) timed out
Oct 30 01:20:40 uatrp2 lrmd[2592]:  warning: operation_finished: 
ybrpstat_monitor_5000:15638 - timed out after 20000ms
Oct 30 01:20:40 uatrp2 crmd[2595]:    error: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms)
Oct 30 01:20:40 uatrp1 crmd[2523]:  warning: update_failcount: Updating 
failcount for ybrpstat on uatrp2 after failed monitor: rc=1 (update=value++, 
time=1414592440)
Oct 30 01:20:40 uatrp1 crmd[2523]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Oct 30 01:20:40 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:20:41 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:1 on uatrp2: unknown error (1)
Oct 30 01:20:41 uatrp1 pengine[2522]:   notice: LogActions: Recover 
ybrpstat:1#011(Started uatrp2)
Oct 30 01:20:41 uatrp1 pengine[2522]:   notice: process_pe_message: Calculated 
Transition 692: /var/lib/pacemaker/pengine/pe-input-817.bz2
Oct 30 01:20:41 uatrp1 crmd[2523]:   notice: te_rsc_command: Initiating action 
4: stop ybrpstat_stop_0 on uatrp2
Oct 30 01:20:41 uatrp2 proxy(ybrpstat)[15696]: INFO: proxy is not stopped by 
pacemaker return success
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-ybrpstat (1)
Oct 30 01:20:41 uatrp2 crmd[2595]:   notice: process_lrm_event: LRM operation 
ybrpstat_stop_0 (call=22, rc=0, cib-update=15, confirmed=true) ok
Oct 30 01:20:41 uatrp1 crmd[2523]:   notice: te_rsc_command: Initiating action 
13: start ybrpstat_start_0 on uatrp2
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
9: fail-count-ybrpstat=1
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1414592440)
Oct 30 01:20:41 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
12: last-failure-ybrpstat=1414592440
Oct 30 01:21:01 uatrp2 lrmd[2592]:  warning: child_timeout_callback: 
ybrpstat_start_0 process (PID 15705) timed out
Oct 30 01:21:01 uatrp2 lrmd[2592]:  warning: operation_finished: 
ybrpstat_start_0:15705 - timed out after 20000ms
Oct 30 01:21:01 uatrp2 crmd[2595]:    error: process_lrm_event: LRM operation 
ybrpstat_start_0 (26) Timed Out (timeout=20000ms)
Oct 30 01:21:01 uatrp1 crmd[2523]:  warning: status_from_rc: Action 13 
(ybrpstat_start_0) on uatrp2 failed (target: 0 vs. rc: 1): Error
Oct 30 01:21:01 uatrp1 crmd[2523]:  warning: update_failcount: Updating 
failcount for ybrpstat on uatrp2 after failed start: rc=1 (update=INFINITY, 
time=1414592461)
Oct 30 01:21:01 uatrp1 crmd[2523]:  warning: update_failcount: Updating 
failcount for ybrpstat on uatrp2 after failed start: rc=1 (update=INFINITY, 
time=1414592461)
Oct 30 01:21:01 uatrp1 crmd[2523]:   notice: run_graph: Transition 692 
(Complete=7, Pending=0, Fired=0, Skipped=1, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-817.bz2): Stopped
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-ybrpstat (INFINITY)
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
14: fail-count-ybrpstat=INFINITY
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1414592461)
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
16: last-failure-ybrpstat=1414592461
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-ybrpstat (INFINITY)
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
18: fail-count-ybrpstat=INFINITY
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_cs_dispatch: Update relayed 
from uatrp1
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1414592461)
Oct 30 01:21:01 uatrp2 attrd[2593]:   notice: attrd_perform_update: Sent update 
20: last-failure-ybrpstat=1414592461
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:21:01 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on uatrp2: unknown error (1)
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: LogActions: Recover 
ybrpstat:1#011(Started uatrp2)
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: process_pe_message: Calculated 
Transition 693: /var/lib/pacemaker/pengine/pe-input-818.bz2
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:21:01 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on uatrp2: unknown error (1)
Oct 30 01:21:01 uatrp1 pengine[2522]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000)
Oct 30 01:21:01 uatrp1 pengine[2522]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000)
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: LogActions: Stop    
ybrpstat:1#011(uatrp2)
Oct 30 01:21:01 uatrp1 pengine[2522]:   notice: process_pe_message: Calculated 
Transition 694: /var/lib/pacemaker/pengine/pe-input-819.bz2
Oct 30 01:21:01 uatrp1 crmd[2523]:   notice: te_rsc_command: Initiating action 
3: stop ybrpstat_stop_0 on uatrp2
Oct 30 01:21:01 uatrp2 proxy(ybrpstat)[15755]: INFO: proxy is not stopped by 
pacemaker return success
Oct 30 01:21:01 uatrp2 crmd[2595]:   notice: process_lrm_event: LRM operation 
ybrpstat_stop_0 (call=29, rc=0, cib-update=17, confirmed=true) ok
Oct 30 01:21:01 uatrp1 crmd[2523]:   notice: run_graph: Transition 694 
(Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-819.bz2): Complete
Oct 30 01:21:01 uatrp1 crmd[2523]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 30 01:22:20 uatrp1 lrmd[2520]:  warning: child_timeout_callback: 
ybrpstat_monitor_5000 process (PID 23676) timed out
Oct 30 01:22:20 uatrp1 lrmd[2520]:  warning: operation_finished: 
ybrpstat_monitor_5000:23676 - timed out after 20000ms
Oct 30 01:22:20 uatrp1 crmd[2523]:    error: process_lrm_event: LRM operation 
ybrpstat_monitor_5000 (17) Timed Out (timeout=20000ms)
Oct 30 01:22:20 uatrp1 crmd[2523]:  warning: update_failcount: Updating 
failcount for ybrpstat on uatrp1 after failed monitor: rc=1 (update=value++, 
time=1414592540)
Oct 30 01:22:20 uatrp1 crmd[2523]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Oct 30 01:22:20 uatrp1 attrd[2521]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-ybrpstat (1)
Oct 30 01:22:20 uatrp1 attrd[2521]:   notice: attrd_perform_update: Sent update 
17: fail-count-ybrpstat=1
Oct 30 01:22:20 uatrp1 attrd[2521]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1414592540)
Oct 30 01:22:20 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:0 on uatrp1: unknown error (1)
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on uatrp2: unknown error (1)
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000)
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: common_apply_stickiness: 
Forcing ybrpstat-clone away from uatrp2 after 1000000 failures (max=1000000)
Oct 30 01:22:20 uatrp1 pengine[2522]:   notice: LogActions: Restart 
ybrpip#011(Started uatrp1)
Oct 30 01:22:20 uatrp1 pengine[2522]:   notice: LogActions: Recover 
ybrpstat:0#011(Started uatrp1)
Oct 30 01:22:20 uatrp1 pengine[2522]:   notice: process_pe_message: Calculated 
Transition 695: /var/lib/pacemaker/pengine/pe-input-820.bz2
Oct 30 01:22:20 uatrp1 attrd[2521]:   notice: attrd_perform_update: Sent update 
20: last-failure-ybrpstat=1414592540
Oct 30 01:22:20 uatrp1 pengine[2522]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op monitor for ybrpstat:0 on uatrp1: unknown error (1)
Oct 30 01:22:20 uatrp1 pengine[2522]:  warning: unpack_rsc_op: Processing 
failed op start for ybrpstat:1 on uatrp2: unknown error (1)





> -----Original Message-----
> From: Alex Samad - Yieldbroker
> Sent: Thursday, 30 October 2014 12:06 PM
> To: pacemaker@oss.clusterlabs.org
> Subject: RE: Help understanding what went wrong
> 
> Hi
> 
> Seems like I had the same problem on my prod cluster
> 
> Oct 30 01:21:53 prodrp1 lrmd[2536]:  warning: child_timeout_callback:
> ybrpstat_monitor_5000 process (PID 17918) timed out Oct 30 01:21:53
> prodrp1 lrmd[2536]:  warning: operation_finished:
> ybrpstat_monitor_5000:17918 - timed out after 20000ms
> Oct 30 01:21:53 prodrp1 crmd[2539]:    error: process_lrm_event: LRM
> operation ybrpstat_monitor_5000 (4384) Timed Out (timeout=20000ms)
> Oct 30 01:21:53 prodrp1 attrd[2537]:   notice: attrd_cs_dispatch: Update
> relayed from prodrp2
> 
> This is the first recording on prodrp1.
> 
> The issue is that prodrp1 was the master for the IP.
> 
> I am wondering if it has something to do with it being a cloned resource ?
> 
> So the cluster realises there is a problem, in the middle of it realising 
> that the
> cluster check monitor fails because (assumption here) it does some sort of
> cluster wide check!
> 
> A
> 
> > -----Original Message-----
> > From: Alex Samad - Yieldbroker [mailto:alex.sa...@yieldbroker.com]
> > Sent: Thursday, 30 October 2014 11:43 AM
> > To: pacemaker@oss.clusterlabs.org
> > Subject: [Pacemaker] Help understanding what went wrong
> >
> > Hi
> >
> >
> > rpm -qa | grep pace
> > pacemaker-libs-1.1.10-14.el6_5.3.x86_64
> > pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
> > pacemaker-cli-1.1.10-14.el6_5.3.x86_64
> > pacemaker-1.1.10-14.el6_5.3.x86_64
> >
> > centos 6.5
> >
> >
> > I have a 2 node cluster
> > pcs config
> > Cluster Name: ybrp
> > Corosync Nodes:
> >
> > Pacemaker Nodes:
> >  wwwrp1 wwwrp2
> >
> > Resources:
> >  Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2)
> >   Attributes: ip=10.32.43.50 cidr_netmask=24 nic=eth0
> > clusterip_hash=sourceip-sourceport
> >   Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s
> >   Operations: start on-fail=restart interval=0s timeout=60s
> > (ybrpip-start-
> > interval-0s)
> >               monitor on-fail=restart interval=5s timeout=20s
> > (ybrpip-monitor-
> > interval-5s)
> >               stop interval=0s timeout=60s (ybrpip-stop-interval-0s)
> >  Clone: ybrpstat-clone
> >   Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1
> >   Resource: ybrpstat (class=ocf provider=yb type=proxy)
> >    Operations: monitor on-fail=restart interval=5s timeout=20s
> > (ybrpstat-
> > monitor-interval-5s)
> >
> > Stonith Devices:
> > Fencing Levels:
> >
> > Location Constraints:
> > Ordering Constraints:
> >   start ybrpstat-clone then start ybrpip (Mandatory)
> > (id:order-ybrpstat-clone-
> > ybrpip-mandatory)
> > Colocation Constraints:
> >   ybrpip with ybrpstat-clone (INFINITY)
> > (id:colocation-ybrpip-ybrpstat-clone-
> > INFINITY)
> >
> > Cluster Properties:
> >  cluster-infrastructure: cman
> >  dc-version: 1.1.10-14.el6_5.3-368c726
> >  last-lrm-refresh: 1414629002
> >  no-quorum-policy: ignore
> >  stonith-enabled: false
> >
> >
> >
> > Basically 1 node died (wwwrp1) hardware failure
> >
> > I can see in the logs the cluster wants to bring the IP address over
> > and it seems to do it but
> >
> >
> >
> > ## this seems to be the ip address moving Oct 30 01:19:43 wwwrp2
> > IPaddr2(ybrpip)[25778]: INFO: Adding inet address 10.32.43.50/24 with
> > broadcast address 10.32.43.255 to device eth0 Oct 30 01:19:43 wwwrp2
> > IPaddr2(ybrpip)[25778]: INFO: Bringing device eth0 up Oct 30 01:19:43
> > wwwrp2 IPaddr2(ybrpip)[25778]: INFO: /usr/libexec/heartbeat/send_arp
> > -i
> > 200 -r 5 -p /var/run/resource-agents/send_arp-10.32.43.50 eth0
> > 10.32.43.50 auto not_used not_used
> >
> >
> > ## this seems to be where it checks other stuff
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: process_lrm_event: LRM
> > operation ybrpip_start_0 (call=13476, rc=0, cib-update=11762,
> > confirmed=true) ok
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: te_rsc_command: Initiating
> > action 8: monitor ybrpip_monitor_5000 on wwwrp2 (local)
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: process_lrm_event: LRM
> > operation ybrpip_monitor_5000 (call=13479, rc=0, cib-update=11763,
> > confirmed=false) ok
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: run_graph: Transition 6828
> > (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-3.bz2): Complete
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >
> > # this is where monitor times out, but it doesn't look like 20000ms ..
> > the initial try was at Oct 30 01:19:43 Oct 30 01:19:44 wwwrp2 lrmd[2459]:
> warning:
> > child_timeout_callback: ybrpstat_monitor_5000 process (PID 25712)
> > timed out Oct 30 01:19:44 wwwrp2 lrmd[2459]:  warning:
> operation_finished:
> > ybrpstat_monitor_5000:25712 - timed out after 20000ms
> > Oct 30 01:19:44 wwwrp2 crmd[2462]:    error: process_lrm_event: LRM
> > operation ybrpstat_monitor_5000 (13473) Timed Out (timeout=20000ms)
> > Oct 30 01:19:44 wwwrp2 crmd[2462]:  warning: update_failcount:
> > Updating failcount for ybrpstat on wwwrp2 after failed monitor: rc=1
> > (update=value++, time=1414592384)
> >
> > I'm guessing because it timed out it went into failed mode. I need to
> > know why it timed out. The script has never timed out before or in
> testing...
> >
> > Am I reading this right.  The reason the resource didn't fail over (ip
> > address) was because there was no ybrpstat running on wwwrp2 and the
> > reason for that was the monitor action failed/timed out
> >
> > Thanks
> > Alex
> >
> >
> > === logs ==
> >
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: unpack_config: On loss of
> > CCM Quorum: Ignore
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:  warning: unpack_rsc_op:
> > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: LogActions: Start
> > ybrpip#011(wwwrp1)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: LogActions: Recover
> > ybrpstat:0#011(Started wwwrp1)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: process_pe_message:
> > Calculated Transition 6827: /var/lib/pacemaker/pengine/pe-input-2.bz2
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: unpack_config: On loss of
> > CCM Quorum: Ignore
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:  warning: unpack_rsc_op:
> > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1)
> > Oct 30 01:19:42
> > wwwrp2 pengine[2461]:  warning: common_apply_stickiness: Forcing
> > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000)
> > Oct
> > 30 01:19:42 wwwrp2 pengine[2461]:  warning: common_apply_stickiness:
> > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures
> > (max=1000000)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: LogActions: Start
> > ybrpip#011(wwwrp2)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: LogActions: Stop
> > ybrpstat:0#011(wwwrp1)
> > Oct 30 01:19:42 wwwrp2 pengine[2461]:   notice: process_pe_message:
> > Calculated Transition 6828: /var/lib/pacemaker/pengine/pe-input-3.bz2
> > Oct 30 01:19:42 wwwrp2 crmd[2462]:   notice: te_rsc_command: Initiating
> > action 7: start ybrpip_start_0 on wwwrp2 (local)
> > Oct 30 01:19:42 wwwrp2 crmd[2462]:   notice: te_rsc_command: Initiating
> > action 1: stop ybrpstat_stop_0 on wwwrp1 Oct 30 01:19:43 wwwrp2
> > IPaddr2(ybrpip)[25778]: INFO: Adding inet address
> > 10.32.43.50/24 with broadcast address 10.32.43.255 to device eth0 Oct
> > 30
> > 01:19:43 wwwrp2 IPaddr2(ybrpip)[25778]: INFO: Bringing device eth0 up
> > Oct
> > 30 01:19:43 wwwrp2 IPaddr2(ybrpip)[25778]: INFO:
> > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-
> > agents/send_arp-10.32.43.50 eth0 10.32.43.50 auto not_used not_used
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: process_lrm_event: LRM
> > operation ybrpip_start_0 (call=13476, rc=0, cib-update=11762,
> > confirmed=true) ok
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: te_rsc_command: Initiating
> > action 8: monitor ybrpip_monitor_5000 on wwwrp2 (local)
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: process_lrm_event: LRM
> > operation ybrpip_monitor_5000 (call=13479, rc=0, cib-update=11763,
> > confirmed=false) ok
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: run_graph: Transition 6828
> > (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-3.bz2): Complete
> > Oct 30 01:19:43 wwwrp2 crmd[2462]:   notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 30 01:19:44 wwwrp2
> > lrmd[2459]:  warning: child_timeout_callback:
> > ybrpstat_monitor_5000 process (PID 25712) timed out Oct 30 01:19:44
> > wwwrp2 lrmd[2459]:  warning: operation_finished:
> > ybrpstat_monitor_5000:25712 - timed out after 20000ms
> > Oct 30 01:19:44 wwwrp2 crmd[2462]:    error: process_lrm_event: LRM
> > operation ybrpstat_monitor_5000 (13473) Timed Out (timeout=20000ms)
> > Oct 30 01:19:44 wwwrp2 crmd[2462]:  warning: update_failcount:
> > Updating failcount for ybrpstat on wwwrp2 after failed monitor: rc=1
> > (update=value++, time=1414592384)
> > Oct 30 01:19:44 wwwrp2 crmd[2462]:   notice: do_state_transition: State
> > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> > cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > Oct 30 01:19:44 wwwrp2 attrd[2460]:   notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: fail-count-ybrpstat (1)
> > Oct 30 01:19:44 wwwrp2 attrd[2460]:   notice: attrd_perform_update: Sent
> > update 12543: fail-count-ybrpstat=1
> > Oct 30 01:19:44 wwwrp2 attrd[2460]:   notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: last-failure-ybrpstat (1414592384)
> > Oct 30 01:19:44 wwwrp2 attrd[2460]:   notice: attrd_perform_update: Sent
> > update 12545: last-failure-ybrpstat=1414592384
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: unpack_config: On loss of
> > CCM Quorum: Ignore
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:  warning: unpack_rsc_op:
> > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1)
> > Oct 30 01:19:44
> > wwwrp2 pengine[2461]:  warning: unpack_rsc_op: Processing failed op
> > monitor for ybrpstat:0 on wwwrp2: unknown error (1) Oct 30 01:19:44
> > wwwrp2 pengine[2461]:  warning: common_apply_stickiness: Forcing
> > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000)
> > Oct
> > 30 01:19:44 wwwrp2 pengine[2461]:  warning: common_apply_stickiness:
> > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures
> > (max=1000000)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: LogActions: Restart
> > ybrpip#011(Started wwwrp2)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: LogActions: Recover
> > ybrpstat:0#011(Started wwwrp2)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: process_pe_message:
> > Calculated Transition 6829: /var/lib/pacemaker/pengine/pe-input-4.bz2
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: unpack_config: On loss of
> > CCM Quorum: Ignore
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:  warning: unpack_rsc_op:
> > Processing failed op start for ybrpstat:0 on wwwrp1: unknown error (1)
> > Oct 30 01:19:44
> > wwwrp2 pengine[2461]:  warning: unpack_rsc_op: Processing failed op
> > monitor for ybrpstat:0 on wwwrp2: unknown error (1) Oct 30 01:19:44
> > wwwrp2 pengine[2461]:  warning: common_apply_stickiness: Forcing
> > ybrpstat-clone away from wwwrp1 after 1000000 failures (max=1000000)
> > Oct
> > 30 01:19:44 wwwrp2 pengine[2461]:  warning: common_apply_stickiness:
> > Forcing ybrpstat-clone away from wwwrp1 after 1000000 failures
> > (max=1000000)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: LogActions: Restart
> > ybrpip#011(Started wwwrp2)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: LogActions: Recover
> > ybrpstat:0#011(Started wwwrp2)
> > Oct 30 01:19:44 wwwrp2 pengine[2461]:   notice: process_pe_message:
> > Calculated Transition 6830: /var/lib/pacemaker/pengine/pe-input-5.bz2
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Help understanding what went wrong

Reply via email to