On Fri, Jan 14, 2011 at 12:45 PM, Dejan Muhamedagic <deja...@fastmail.fm> wrote: > Hi, > > On Wed, Jan 12, 2011 at 02:41:31PM -0700, Patrick H. wrote: >> >> >>Oh, and its not waiting for the resource to stop on the other >> >>node before it starts it up either. >> >>Here's the lrmd log for resource vip_55.63 from the 'ha02' node >> >>(the node I put into standby) >> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop >> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop >> >>process 19063 exited with return code 0. >> >> >> >> >> >>And here's the lrmd log for the same resource on 'ha01' >> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start >> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start >> >>process 8826 exited with return code 0. >> >> >> >> >> >>Notice that it stopped it a full 36 seconds before it tried to >> >>start it on the other node. The times on both boxes are in >> >>sync, so its not that either. >> > >> >Is this the case when you wanted to fail-over a single resource >> >or was it part of the node standby process? >> > >> >Thanks, >> > >> >Dejan >> In that case I put the node in standby. >> >> >> While digging around a bit more, I noticed this: >> Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating >> action 966: stop vip_55.236_stop_0 on ha01 (local) >> Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing >> key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667 >> op=vip_55.236_stop_0 ) >> Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop >> Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop >> process 11414 exited with return code 0. >> Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM >> operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621, >> confirmed=true) ok >> Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action >> vip_55.236_stop_0 (966) confirmed on ha01 (rc=0) >> Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating >> action 967: start vip_55.236_start_0 on ha02 >> Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action >> vip_55.236_start_0 (967) confirmed on ha02 (rc=0) >> >> Notice the huge delays before the match_graph_event on both stop and >> start. So it seems everything is waiting on match_graph_event. What >> is this? > > Can't say, but perhaps Andrew would know, though I'm not sure if > there's enough information here. Best to open a bugzilla and > attach hb_report.
Did a bug get created for this? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker