On 2013-04-24T10:37:24, Johan Huysmans <johan.huysm...@inuits.be> wrote:
> --> start situation > * scope=status name=fail-count-d_tomcat value=0 > * depending resource group running on node > * crm_mon shows everything ok > > --> a failure occurs > * scope=status name=fail-count-d_tomcat value=1 > * depending resource group stopping on node > * crm_mon shows failure > > --> After 30s (= failure-timeout) > * scope=status name=fail-count-d_tomcat value=1 > * depending resource group not running on node > * crm_mon shows NO failure !!!!! This, by itself, is not necessarily surprising. The property "cluster-reheck-interval" defines how often the PE gets re-run, and defaults to 15 minutes. This is not dynamically adjusted based on failure-timeouts, and if this feature becomes more widely used, there probably should be a "better" way to handle/trigger these while still avoiding swamping the cluster with empty transitions etc. In short: right now, if you want a failure-timeout of 30s to be meaningful, you need to set cluster-recheck-interval to something shorter. > --> After something changes in the cluster or the recheck interval > * scope=status name=fail-count-d_tomcat value=0 > * depending resource group can run on node > * crm_mon shows no failure > * BUT my resource is still monitored and failing! I'm not sure I perfectly get what you're saying here with the last sentence. Did the cluster try to restart it, and it failed again, yet the failure was ignored this time around? > I find it disturbing that a resource with a failing monitor has a 0 > failcount, shows ok in crm_mon and allows to run the depending > resources. Yes, if I got that right, that would be a problem - please create a hb_/crm_report and open a bug. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org