On Tue, Jan 17, 2012 at 3:04 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 17.01.2012 04:01, Andrew Beekhof wrote: >> On Mon, Jan 16, 2012 at 5:45 PM, Vladislav Bogdanov >> <bub...@hoster-ok.com> wrote: >>> 16.01.2012 09:20, Andrew Beekhof wrote: >>> [snip] >>>>>> At the same time, stonith_admin -B succeeds. >>>>>> The main difference I see is st_opt_sync_call in a latter case. >>>>>> Will try to experiment with it. >>>>> >>>>> Yeeeesssss!!! >>>>> >>>>> Now I see following: >>>>> Dec 19 11:53:34 vd01-a cluster-dlm: [2474]: info: >>>>> pacemaker_terminate_member: Requesting that node 1090782474/vd01-b be >>>>> fenced >>>> >>>> So the important question... what did you change? >>> >>> Nice you're back ;) >>> >>> + rc = st->cmds->fence(st, *st_opt_sync_call*, node_uname, "reboot", 120); >> >> Really struggling to see how changing anything here can impact whether >> the log message /before/ it gets printed. > > Did I say it? ;)
Sorry, I pattern matched the pacemaker_terminate_member and thought it came from my original function. At a loss to explain why your code logs but pacemaker's doesn't. > > Line of the interest here is not > > Dec 19 11:53:34 vd01-a cluster-dlm: [2474]: info: > pacemaker_terminate_member: Requesting that node 1090782474/vd01-b be fenced > > which was added by me it that function, but the next one: > > Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: > initiate_remote_stonith_op: Initiating remote operation reboot for > vd01-b: 21425fc0-4311-40fa-9647-525c3f258471 > > which indicates that fencing is fired (and the rest). > >> >>> >>> attaching my resulting version of pacemaker.c (which still has a lot of >>> mess because of different approaches I tried to get the result and needs >>> a cleanup). Function you may look at is pacemaker_terminate_member() >>> which is almost one-to-one copy of crm_terminate_member_no_mainloop() >>> except rename of variable to compile without warnings and change of >>> ->fence() arguments. >>> >>>> >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: >>>>> initiate_remote_stonith_op: Initiating remote operation reboot for >>>>> vd01-b: 21425fc0-4311-40fa-9647-525c3f258471 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: crm_get_peer: Node >>>>> vd01-c now has id: 1107559690 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: stonith_command: >>>>> Processed st_query from vd01-c: rc=0 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: crm_get_peer: Node >>>>> vd01-d now has id: 1124336906 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: stonith_command: >>>>> Processed st_query from vd01-d: rc=0 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: stonith_command: >>>>> Processed st_query from vd01-a: rc=0 >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: call_remote_stonith: >>>>> Requesting that vd01-c perform op reboot vd01-b >>>>> Dec 19 11:53:34 vd01-a stonith-ng: [1905]: info: crm_get_peer: Node >>>>> vd01-b now has id: 1090782474 >>>>> ... >>>>> Dec 19 11:53:40 vd01-a stonith-ng: [1905]: info: stonith_command: >>>>> Processed st_fence_history from cluster-dlm: rc=0 >>>>> Dec 19 11:53:40 vd01-a crmd: [1910]: info: tengine_stonith_notify: Peer >>>>> vd01-b was terminated (reboot) by vd01-c for vd01-a >>>>> (ref=21425fc0-4311-40fa-9647-525c3f258471): OK >>>>> >>>>> But, then I see minor issue that node is marked to be fenced again: >>>>> Dec 19 11:53:40 vd01-a pengine: [1909]: WARN: pe_fence_node: Node vd01-b >>>>> will be fenced because it is un-expectedly down >>>> >>>> Do you have logs for that? >>>> tengine_stonith_notify() got called, that should have been enough to >>>> get the node cleaned up in the cib. >>> >>> Ugh, seems like yes, but they are archived already. Will get them back >>> to nodes and try to compose hb_report for them (but pe inputs are >>> already lost, do you still need logs without them?) >>> >>>> >>>>> ... >>>>> Dec 19 11:53:40 vd01-a pengine: [1909]: WARN: stage6: Scheduling Node >>>>> vd01-b for STONITH >>>>> ... >>>>> Dec 19 11:53:40 vd01-a crmd: [1910]: info: te_fence_node: Executing >>>>> reboot fencing operation (249) on vd01-b (timeout=60000) >>>>> ... >>>>> Dec 19 11:53:40 vd01-a stonith-ng: [1905]: info: call_remote_stonith: >>>>> Requesting that vd01-c perform op reboot vd01-b >>>>> >>>>> And so on. >>>>> >>>>> I can't investigated this one in more depth, because I use fence_xvm in >>>>> this testing cluster, and it has issues when running more than one >>>>> stonith resource on a node. Also, my RA (in a cluster where this testing >>>>> cluster runs) undefines VM after failure, so fence_xvm does not see >>>>> fencing victim in a qpid and is unable to fence it again. >>>>> >>>>> May be it is possible to look if node was just fenced and skip redundant >>>>> fencing? >>>> >>>> If the callbacks are being used correctly, it shouldn't be required >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org