Re: [Pacemaker] very slow pacemaker/corosync shutdown

Vladislav Bogdanov Thu, 19 Sep 2013 22:18:54 -0700

20.09.2013 02:52, Andrew Beekhof wrote:
> 
> On 19/09/2013, at 7:45 PM, David Lang <da...@lang.hm> wrote:
> 
>> On Thu, 19 Sep 2013, Florian Crouzat wrote:
>>
>>> Le 19/09/2013 00:25, David Lang a ?crit :
>>>> I'm frequently running into a problem that shutting down
>>>> pacemaker/corosync takes a very long time (several minutes)
>>>
>>> Just to be 100% sure, you always respect the stop order ? Pacemaker *then* 
>>> CMAN/corosync ?
>>
>> 'service pacemaker stop' seems to take down cman as well, but frequently 
>> stalls before that.
> 
> logs?
> 
>>
>> we are definantly not taking down cman ahead of time.
>>
>> But we are seeing problems on some systems where we start everything up, 
>> verify both nodes are seen, and then a day or
>> so later notice that the two boxes are not communicating (one of the reasons 
>> we are looking at disabling multicast, the
>> local networking people have 'interesting' ideas about multicast, and
they may be causing problems)
> 
> this is quite likely the problem.
> multicast support in various parts of the hardware and software stacks seems 
> to be getting worse and worse over time :(


+1
With modern EL6 kernel I now see cluster nodes are advertising
themselves as a multicast routers for some reason in *some* bridged
vlans, and switch forwards all the multicast packets to them, instead of
looking at the igmp snooping table. For some reason switch is forwarding
mcast in *all* vlans to that "mrouters".
It seems that nothing perfect exists in the multicast world. :(


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] very slow pacemaker/corosync shutdown

Reply via email to