Re: [Pacemaker] [Fwd: Re: IPaddr2 not failing-over]

Ron Kerry Thu, 02 Sep 2010 05:29:31 -0700

Andrew Beekhof wrote:

On Wed, Sep 1, 2010 at 2:51 PM, Ron Kerry <rke...@sgi.com> wrote:
> I have taken over working this issue from Vince. The ping cloneresource and> constraints were setup as described in the prior attached link.Things were
 > still not working correctly and the resources were not failing over as
 > expected when we ifconfig'd one of the monitored interfaces down. I
 > discovered a bug in the pacemaker/ping script (from the SLE11 HAE
> distribution) where a "*" in an expr statement had not been quotedand was
 > thus being interpreted by the shell.
Also fixed upstream.

 > I fixed this problem and I was able to
> get a single failover to occur, but after that failover the pingmonitor was> canceled on the node that had the downed interface. Even afterconfiguring> the interface back up, the monitor task never run again to noticethat fact.> This essentially leaves that node with a lower score and improperinterface> monitoring. I can clear the problem by stopping and then starting theping
 > clone resource. Note that I have tried pulling up the full ping resource
> agent script from the SLE11 HAE SP1 distribution and that does notimprove
 > this particular problem (though it fixes a few others).
 >
> I have attached the full hb_report output, but here is a log snip ofwhat is
 > occurring.
 >
 > Sep  1 06:43:50 hpcnas2 root: ifconfig eth3 down
 > Sep  1 06:43:59 hpcnas2 ntpd[10303]: Deleting interface #13 eth3,
 > 10.10.20.32#123, interface stats: received=0, sent=0, dropped=0,
 > active_time=42600 secs
 > Sep  1 06:43:59 hpcnas2 ntpd[10303]: Deleting interface #15 eth3,
 > 10.10.20.33#123, interface stats: received=0, sent=0, dropped=0,
 > active_time=41100 secs
 > Sep  1 06:44:01 hpcnas2 ping[28882]: [28887]: INFO: ping monitor invoked
> Sep 1 06:44:05 hpcnas2 ping[28882]: [28895]: ERROR: Unexpectedresult for
 > 'ping -n -q -W 5 -c 5 10.10.20.30' 2: connect: Network is unreachable
> Sep 1 06:44:14 hpcnas2 attrd: [13676]: info: attrd_trigger_update:Sending
 > flush op to all hosts for: pingd (2000)
 > Sep  1 06:44:14 hpcnas2 attrd: [13676]: info: attrd_perform_update: Sent
 > update 56: pingd=2000
 > Sep  1 06:44:14 hpcnas2 crmd: [13678]: info: do_lrm_rsc_op: Performing
 > key=34:686:0:bbe666a5-2b9f-4419-9728-803197b6e643 op=NFS_stop_0 )
 > Sep  1 06:44:14 hpcnas2 lrmd: [13675]: info: rsc:NFS:83: stop
 > ...
 > resources failover
 > ...
 > Sep  1 06:45:09 hpcnas2 ping[29241]: [29246]: INFO: ping monitor invoked
> Sep 1 06:45:13 hpcnas2 ping[29241]: [29254]: ERROR: Unexpectedresult for
 > 'ping -n -q -W 5 -c 5 10.10.20.30' 2: connect: Network is unreachable
 > Sep  1 06:45:17 hpcnas2 crmd: [13678]: info: process_lrm_event: LRM
 > operation ping:1_monitor_60000 (call=82, status=1, cib-update=0,
 > confirmed=true) Cancelled
 > Sep  1 06:45:32 hpcnas2 kernel: bnx2: eth3: using MSIX
> Sep 1 06:45:35 hpcnas2 kernel: bnx2: eth3 NIC Copper Link is Up,1000 Mbps
 > full duplex
 > Sep  1 06:45:38 hpcnas2 root: ifconfig eth3 up
> Sep 1 06:48:08 hpcnas2 root: ping monitor appears to be no longerrunning
 >
 >
> The concern is the "process_lrm_event: LRM operationping:1_monitor_60000 ()
 > Cancelled" event.

Was the resource stopped?  Thats the only time I could imagine a
recurring operation being cancelled.

No it was not stopped. In fact, from the "crm_mon" output that is included with the hb_report outputyou can see that the resource still shows as running on both HA cluster nodes. How can I dig furtherto figure out what and why the monitor operation is being canceled.

> NOTE: The "ping monitor invoked" messages are a debug statement Iadded to
 > the RA script so I know when the ping_monitor() routine is called.
 >
 > Thanks for any assistance you can provide -- Ron
 >




--

Ron Kerry         rke...@sgi.com
Field Technical Support - SGI Federal
Home Office: 248 375-5671  Cell: 248 761-7204

--------------
NB: Information in this message is SGI confidential. It is intended solely for
the person(s) to whom it is addressed and may not be copied, used, disclosed or
distributed to others without SGI consent. If you are not the intended
recipient please notify me by email or telephone, delete the message from your
system immediately and destroy any printed copies.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Fwd: Re: IPaddr2 not failing-over]

Reply via email to