20.09.2013 02:52, Andrew Beekhof wrote: > > On 19/09/2013, at 7:45 PM, David Lang <da...@lang.hm> wrote: > >> On Thu, 19 Sep 2013, Florian Crouzat wrote: >> >>> Le 19/09/2013 00:25, David Lang a ?crit : >>>> I'm frequently running into a problem that shutting down >>>> pacemaker/corosync takes a very long time (several minutes) >>> >>> Just to be 100% sure, you always respect the stop order ? Pacemaker *then* >>> CMAN/corosync ? >> >> 'service pacemaker stop' seems to take down cman as well, but frequently >> stalls before that. > > logs? > >> >> we are definantly not taking down cman ahead of time. >> >> But we are seeing problems on some systems where we start everything up, >> verify both nodes are seen, and then a day or >> so later notice that the two boxes are not communicating (one of the reasons >> we are looking at disabling multicast, the >> local networking people have 'interesting' ideas about multicast, and they may be causing problems) > > this is quite likely the problem. > multicast support in various parts of the hardware and software stacks seems > to be getting worse and worse over time :(
+1 With modern EL6 kernel I now see cluster nodes are advertising themselves as a multicast routers for some reason in *some* bridged vlans, and switch forwards all the multicast packets to them, instead of looking at the igmp snooping table. For some reason switch is forwarding mcast in *all* vlans to that "mrouters". It seems that nothing perfect exists in the multicast world. :( _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org