SOLVED!

[more below]

On 01/08/14 06:30, Andrew Beekhof wrote:
> On 1 Aug 2014, at 2:04 pm, Andrew Beekhof <and...@beekhof.net> wrote:
>
>> On 1 Aug 2014, at 7:47 am, Andrew Beekhof <and...@beekhof.net> wrote:
>>
>>> On 31 Jul 2014, at 4:46 pm, Cédric Dufour - Idiap Research Institute 
>>> <cedric.duf...@idiap.ch> wrote:
>>>
>>>> On 31/07/14 00:17, Andrew Beekhof wrote:
>>>>> On 31 Jul 2014, at 2:48 am, Cédric Dufour - Idiap Research Institute 
>>>>> <cedric.duf...@idiap.ch> wrote:
>>>>>
>>>>>> After packaging pacemaker 1.1.12 for Debian/Wheezy (along corosync 1.4.6 
>>>>>> and libqb 0.17.0), I have successfully initialized a new cluster.
>>>>>>
>>>>>> Back to a very simple test cluster, the only problem I have is with 
>>>>>> fencing, which fails altogether with "route_ais_message: Sending message 
>>>>>> to local.stonith-ng failed: ipc delivery failed (rc=-2)" messages:
>>>>>>
>>>>>> root@bc1hs22a01:~ # tail /var/log/corosync.rsyslog
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonith_admin[5411]:   notice: crm_log_args: 
>>>>>> Invoked: stonith_admin -F bc1hs22a02
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]:   notice: handle_request: 
>>>>>> Client stonith_admin.5411.fe1388ed wants to fence (off) 'bc1hs22a02' 
>>>>>> with device '(any)'
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]:   notice: 
>>>>>> initiate_remote_stonith_op: Initiating remote operation off for 
>>>>>> bc1hs22a02: 48b69f82-29ad-4c9a-af57-0e60ae5242e4 (0)
>>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]:   [pcmk  ] WARN: 
>>>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc 
>>>>>> delivery failed (rc=-2)
>>>>> rc=-2 is coming from send_client_ipc(void *conn, const AIS_Message * 
>>>>> ais_msg)
>>>>>
>>>>> specifically:
>>>>>
>>>>>  if (conn == NULL) {
>>>>>      rc = -2;
>>>>>
>>>>> So the plugin thinks that stonith-ng isn't connected.
>>>>> More logs?
>>>>>
>>>> I have completed a full restart of the cluster in order to provide the 
>>>> logs at each step; see attached log files:
>>>> (from node_1/DC)
>>>> - node_1-corosync-start.log
>>>> - node_1-pacemaker-start.log
>>>> - node_1-corosync-node_2_join.log
>>>> - node_1-pacemaker-node_2_join.log
>>>> (from node_2)
>>>> - node_2-corosync-start.log
>>>> - node_2-pacemaker-start.log
>>>>
>>>> The problem manifests itself already in DC start log - because of previous 
>>>> fencing attempt - at 08:19:21 and 08:19:42:
>>>>
>>>> root@bc1hs22a01:~ # fgrep 'ipc delivery failed' node_1-corosync-start.log
>>>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: 
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc 
>>>> delivery failed (rc=-2)
>>>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: 
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc 
>>>> delivery failed (rc=-2)
>>>>
>>>> While it would seem (to me) that the stonith plugin successfully connected 
>>>> to the CIB:
>>> Its not the CIB thats the issue:
>>>
>>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]:   [pcmk  ] WARN: 
>>>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc 
>>>>>> delivery failed (rc=-2)
>>> Thats the pacemaker plugin inside corosync (which uses a completely 
>>> different IPC mechanism).
>> It looks like there is a name mismatch:
>>
>> Jul 31 08:19:20 bc1hs22a01 corosync[31057]:   [pcmk  ] info: pcmk_ipc: 
>> Recorded connection 0x2543e30 for stonithd/0
>> Jul 31 08:19:20 bc1hs22a01 corosync[31057]:   [pcmk  ] debug: 
>> process_ais_message: Msg[1] (dest=local:ais, from=bc1hs22a01:stonithd.31092, 
>> remote=true, size=6): 31092
>> ...
>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: 
>> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery 
>> failed (rc=-2)
>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: 
>> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery 
>> failed (rc=-2)
>>
>> Could you try the following patch?
> Actually, try this one instead:
>    https://github.com/beekhof/pacemaker/commit/21830a0

This one-line patch did it:

Aug  1 09:48:26 bc1hs22a01 corosync[15681]:   [pcmk  ] info: pcmk_ipc: Recorded 
connection 0x1a926c0 for stonith-ng/0
Aug  1 09:48:26 bc1hs22a01 corosync[15681]:   [pcmk  ] info: pcmk_ipc: Sending 
membership update 120 to stonith-ng

And (previously attempted/recorded) fencing command worked as soon as the DC 
started.

Thank you very much for your quick response!
(I can now enjoy Switzerland National Day with total peace of mind :-) )

PS: I'll carry out further cluster/fencing tests nest week (should you want a 
thorougher confirmation before pushing your patch to master)

>>> FWIW, the plugin is extremely deprecated, you're encouraged to use 
>>> pacemaker+cman or begin working towards corosync2 + pacemakerd.
>>>
>>>

I'll keep this in mind (but not so easy to achieve when one is willing to not 
stray too far from Debian "stable").

Best and thanks again,

Cédric



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to