Hi, When the promote/demote action returns error code, it seems that failcount isn't incremented, so promote/demote action would go into a loop in some cases. Default settings for promote/demote are implicitly-defined (on_fail="restart" and interval=0). Is it possible to handle them as in the case of start/stop operation? It means, if there are some errors about promote/demote, pacemaker considers its interval as 1 temporarily. see attached.
Thanks, Junko IKEDA NTT DATA INTELLILINK CORPORATION
diff -r 239d0779f3b4 crmd/te_events.c --- a/crmd/te_events.c Wed Oct 20 18:27:24 2010 +0200 +++ b/crmd/te_events.c Thu Oct 21 17:12:54 2010 +0900 @@ -127,6 +127,9 @@ update_failcount(xmlNode *event, const c } else if(safe_str_eq(task, CRMD_ACTION_STOP)) { interval = 1; value = failed_stop_offset; + + } else if(safe_str_eq(task, CRMD_ACTION_PROMOTE) || safe_str_eq(task, CRMD_ACTION_DEMOTE)) { + interval = 1; } if(value == NULL || safe_str_neq(value, INFINITY_S)) {
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker