----- Original Message ----- > From: "Parshvi" <parshvi...@gmail.com> > To: pacema...@clusterlabs.org > Sent: Monday, September 10, 2012 4:06:51 AM > Subject: Re: [Pacemaker] Upgrading to Pacemaker 1.1.7. Issue: sticky > resources failing back after reboot > > David Vossel <dvossel@...> writes: > > > Hi, > > > We have upgraded pacemaker version 1.0.12 to 1.1.7 > > > The upgrade was done since resources failed to recover after a > > > timeout > > > (monitor|stop[unmanaged]) and logs observed are: > > > > > > WARN: print_graph: Synapse 6 is pending (priority: 0) > > > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: [Action > > > 103]: Pending > > > (id: SnmpAgent_monitor_5000, loc: CSS-FU-2, priority: 0) > > > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_elem: * > > > [Input > > > 102]: Pending > > > (id: SnmpAgent_start_0, loc: CSS-FU-2, priority: 0) > > > Sep 03 16:55:18 CSS-FU-2 crmd: [25200]: WARN: print_graph: > > > Synapse 7 > > > is pending > > > (priority: 0) > > > > > > Reading through the forum mails, it was inferred that this issue > > > is > > > fixed in > > > 1.1.7 > > > > > > Platform OS: OEL 5.8 > > > Pacemaker Version: 1.1.7 > > > Corosync version: 1.4.3 > > > > > > Pacemaker and all its dependent packages were built from source > > > (tarball:github). > > > glib version used for build: 2.32.2 > > > > > > The following issue is observed in Pacemaker 1.1.7: > > > 1) There is a two-node cluster. > > > 2) When primary node is rebooted/or pacemaker is restarted, the > > > resources fail- > > > over to secondary. > > > 3) There are 4 group of services. > > > 2 group are not sticky. > > > 1 group is master/slave multi-state resource > > > 1 group is STICKY > > > 4) When primary node comes online, even the sticky resources fail > > > back to > > > primary node (Issue) > > > 5) Now, if the secondary node is rebooted, the resources fail > > > over to > > > primary > > > node. > > > 6) Once the secondary node is up, only non-sticky resources > > > fail-back. Sticky > > > resources remain on primary node. > > > > > > 7) Even if Location preference of sticky resources is set for > > > Node-2(the > > > secondary node), still sticky resources fail-back on Node-1. > > > > > > We're using pacemaker 1.0.12 on Production. We're facing issues > > > of > > > IPaddr and > > > other resources monitor operation timing out and pacemaker not > > > recovering from > > > it (shared above). > > > > > > Any help is welcome. > > > > > > PS: Please mention, if any logs or configuration needs to be > > > shared. > > > > My guess is that this is an issue with node scores for the > > resources in > question. Stickiness and location > > constraints work in a similar way. You could really think of > > resource > stickiness as a temporary location > > constraint on a resource that changes depending on what node it is > > on. > > > > If you have a resource with stickiness enabled and you want the > > resource to > stay put, the stickiness score > > has to out weigh all the location constraints for that resource on > > other > nodes. If you are using colocation > > constraints, this becomes increasingly complicated as a resources > > per node > location score could change > > based on the location of another resource. > > > > For specific advice on your scenario, there is little we can offer > > without > seeing your exact configuration. > > > Hi David, > Thanks for a quick response. > > I have shared the configuration on the following path: > https://dl.dropbox.com/u/20096935/cib.txt > > The issue has been observed for the following group: > 1) Rsc_Ms1 > 2) Rsc_S > 3) Rsc_T > 4) Rsc_TGroupClusterIP > > Colocation: Resources 1) 2) and 3) have been colocated with resource > 4) > Location preference: Resource 4) prefers a one of the nodes in the > cluster > Ordering: Resources 1) 2) and 3) would be started (no sequential > ordering > between these resources) when rsc 4) is started. >
I'm not an expert when it comes to scoring but if you want a resource to prefer to stay on current node instead of fail-back to preferred node then I would definitely set the resource-stickiness value much higher to ensure that behavior - test with 20 or 50 or 200 as the stickiness value and at least the fail-back problem should go away. I believe the issue is that you have multiple collocated resources that have a node location preference with score 1. This causes the score to be the sum of the collocated resources location scores and that sum is higher than your stickiness score of 1 causing the movement. HTH Jake _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org