09.12.2011 08:44, Andrew Beekhof wrote: > On Fri, Dec 9, 2011 at 3:16 PM, Vladislav Bogdanov <bub...@hoster-ok.com> > wrote: >> 09.12.2011 03:11, Andrew Beekhof wrote: >>> On Fri, Dec 2, 2011 at 1:32 AM, Vladislav Bogdanov <bub...@hoster-ok.com> >>> wrote: >>>> Hi Andrew, >>>> >>>> I investigated on my test cluster what actually happens with dlm and >>>> fencing. >>>> >>>> I added more debug messages to dlm dump, and also did a re-kick of nodes >>>> after some time. >>>> >>>> Results are that stonith history actually doesn't contain any >>>> information until pacemaker decides to fence node itself. >>> >>> ... >>> >>>> From my PoV that means that the call to >>>> crm_terminate_member_no_mainloop() does not actually schedule fencing >>>> operation. >>> >>> You're going to have to remind me... what does your copy of >>> crm_terminate_member_no_mainloop() look like? >>> This is with the non-cman editions of the controlds too right? >> >> Just latest github's version. You changed some dlm_controld.pcmk >> functionality, so it asks stonithd for fencing results instead of XML >> magic. But call to crm_terminate_member_no_mainloop() remains the same >> there. But yes, that version communicates stonithd directly too. >> >> SO, the problem here is just with crm_terminate_member_no_mainloop() >> which for some reason skips actual fencing request. > > There should be some logs, either indicating that it tried, or that it failed.
Nothing about fencing. Only messages about history requests: stonith-ng: [1905]: info: stonith_command: Processed st_fence_history from cluster-dlm: rc=0 I even moved all fencing code to dlm_controld to have better control on what does it do (and not to rebuild pacemaker to play with that code). dlm_tool dump prints the same line every second, stonith-ng prints history requests. A little bit odd, by I saw one time that fencing request from cluster-dlm succeeded, but only right after node was fenced by pacemaker. As a result, node was switched off instead of reboot. That raises one more question: is it correct to call st->cmds->fence() with third parameter set to "off"? I think that "reboot" is more consistent with the rest of fencing subsystem. At the same time, stonith_admin -B succeeds. The main difference I see is st_opt_sync_call in a latter case. Will try to experiment with it. Best, Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org