On Wed, Jun 25, 2014 at 1:28 AM, Andrew Beekhof <and...@beekhof.net> wrote:
> > > SO it seems at midnight the resource already was with a failcount of 2 > (perhaps caused by problems happened weeks ago..?) and then at 03:38 got a > timeout on monitoring its state and was relocated... > > > > pacemaker is at 1.1.6-1.27.26 > > I don't think the automatic reset was part of 1.1.6. > The documentation you're referring to is probably SLES12 specific. > > > and I see this list message that seems related: > > http://oss.clusterlabs.org/pipermail/pacemaker/2012-August/015076.html > > > > Is it perhaps only a matter of setting meta parameter > > failure-timeout > > as explained in High AvailabilityGuide: > > > https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha/book_sleha.html#sec.ha.config.hawk.rsc > > > > in particular > > 5.3.6. Specifying Resource Failover Nodes > > ... > > 4. If you want to automatically expire the failcount for a resource, add > the failure-timeout meta attribute to the resource as described in > Procedure 5.4: Adding Primitive Resources, Step 7 and enter a Value for the > failure-timeout. > > .. > > ? > Yes, your are right. it seems that starting from here: https://www.suse.com/it-it/documentation/sles11/ or here https://www.suse.com/documentation/sles11/ the SLES 11 html links for "SUSE Linux Enterprise High Availability Extension Guide" erroneously point to SLES 12 anyway... Tried to select "feedback" button at bottom but it doesn't work (at least on my chrome browser on Fedora 20) for niether the italy one not the english one... Going through pdf docments I already downloaded before, I still have this for SLES 11 SP2 as the system in object " 5.3.5 Specifying Resource Failover Nodes ... A resource will be automatically restarted if it fails. If that cannot be achieved on the current node, or it fails N times on the current node, it will try to fail over to another node. You can define a number of failures for resources (a migration-threshold), after which they will migrate to a new node. If you have more than two nodes in your cluster, the node a particular resource fails over to is chosen by the High Availability software. However, you can specify the node a resource will fail over to by proceeding as follows: 1 Configure a location constraint for that resource as described in Procedure 5.6, “Adding or Modifying Locational Constraints” (page 86). 2 Add the migration-threshold meta attribute to that resource as described in Procedure 5.3, “Adding or Modifying Meta and Instance Attributes” (page 82) and enter a Value for the migration-threshold. The value should be positive and less that INFINITY. 3 If you want to automatically expire the failcount for a resource, add the failure-timeout meta attribute to that resource as described in Procedure 5.3, “Adding or Modifying Meta and Instance Attributes” (page 82) and enter a Value for the failure-timeout. 4 If you want to specify additional failover nodes with preferences for a resource, create additional location constraints. " So the question remains about "failure-timeout" parameter and/or other methods to solve/mitigate what I described in my first message. Thanks, Gianluca
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org