On 03/18/2014 09:04 PM, Andrew Beekhof wrote: > Riiiight, so this is the story: > > Mar 08 08:43:22 [9934] lorien crmd: info: do_dc_takeover: Taking > over DC status for this partition > Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: > Peer gandalf was terminated (st_notify_fence) by mordor for gandalf: OK > (ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by client crmd.31561 > Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: > Notified CMAN that 'gandalf' is now fenced > Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: > Target may have been our leader gandalf (recorded: <unset>) > Mar 08 09:13:52 [9934] lorien crmd: info: do_dc_takeover: Taking > over DC status for this partition > Mar 08 09:13:52 [9934] lorien crmd: notice: do_dc_takeover: Marking > gandalf, target of a previous stonith action, as clean > > In tengine_stonith_notify() we potentially add things to stonith_cleanup_list > and then in do_dc_takeover() we check the stonith_cleanup_list and mark any > nodes in it as clean. > > As you can see above, the stonith notification comes just after the call to > do_dc_takeover(). > In the version you have there is some dodgy code in tengine_stonith_notify() > which incorrectly adds gandalf to stonith_cleanup_list, causing Pacemaker to > (incorrectly) erase its status section at 9:13:52 when another election > occurs. > > This was fixed during the RC-phase of Pacemaker-1.1.10: > > https://github.com/beekhof/pacemaker/commit/f30e1e43 > > I don't believe I quite understood the severity of that fix at the time > (otherwise I'd have made more noise about it). > > Since you're on CentOS 6.4, there should already be updated packages that > include this fix.
Andrew: thanks again for taking the time to check this case. We will be updating to 1.1.10 as soon as possible. Hugs!
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org