Frankly, I agree with Allen's comments.

I think that discovering the zookeeper should be done with a well known DNS
address (e.g. zookeeper.$cluster.prod.example.com). It would be pretty rare
for something like the address of the zookeeper to change in a stable
infrastructure. When it does, DNS can be updated as part of the procedure of
the change.

Using multicast on the other hand introduces a higher barrier to getting a
hadoop cluster running as one must then troubleshoot and multicast issues
that come up.

wt

On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer <a...@apache.org> wrote:

>
> On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:
>
> > Did you know that almost all linux desktop system comes with avahi
> > pre-installed and turn on by default?
>
>        ... which is why most admins turn those services off by default. :)
>
> >  What is more interesting is
> > that there are thousands of those machines broadcast in large
> > cooperation without anyone noticing them?
>
>        That's because many network teams turn off multicast past the subnet
> boundary and many corporate desktops are in class C subnets.  This
> automatically limits the host count down to 200-ish per network.  Usually
> just the unicast traffic is bad enough.  Throwing multicast into the mix
> just makes it worse.
>
> > I have recently built a
> > multicast dns browser and look into the number of machines running in
> > a large company environment.  The number of desktop, laptop and
> > printer machines running multicast dns is far exceeding 1000 machines
> > in the local subnet.
>
>        From my understanding of Y!'s network, the few /22's they have
> (which would get you 1022 potential hosts on a subnet) have multicast
> traffic dropped at the router and switch levels.  Additionally, DNS-SD (the
> service discovery portion of mDNS) offers unicast support as well.  So there
> is a very good chance that the traffic you are seeing is from unicast, not
> multicast.
>
>        The 1000 number, BTW, comes from Apple.  I'm sure they'd be
> interested in your findings given their role in ZC.
>
>        BTW, I'd much rather hear that you set up a /22 with many many
> machines running VMs trying to actually use mDNS for something useful.  A
> service browser really isn't that interesting.
>
> > They are all happily working fine without causing any issues.
>
>        ... that you know of.  Again, I'm 99% certain that Y! is dropping
> multicast packets into the bit bucket at the switch boundaries.  [I remember
> having this conversation with them when we setup the new data centers.]
>
> >  Printer works fine,
>
>        Most admins turn SLP and other broadcast services on printers off.
> For large networks, one usually sees print services enabled via AD or master
> print servers broadcasting the information on the local subnet.  This allows
> a central point of control rather than randomness.   Snow Leopard (I don't
> think Leopard did this) actually tells you where the printer is coming from
> now, so that's handy to see if they are ZC or AD or whatever.
>
> > itune sharing from someone
> > else works fine.
>
>        iTunes specifically limits its reach so that it can't extend beyond
> the local subnet and definitely does unicast in addition to ZC, so that
> doesn't really say much of anything, other than potentially invalidating
> your results.
>
> >  For some reason, things tend to work better on my
> > side of universe. :)
>
>        I'm sure it does, but not for the reasons you think they do.
>
> > Allen, if you want to get stuck on stone age
> > tools, I won't stop you.
> >
>
>        Multicast has a time and place (mainly for small, non-busy
> networks).  Using it without understanding the network impact is never a
> good idea.
>
>        FWIW, I've seen multicast traffic bring down an entire campus of
> tens of thousands of machines due to routers and switches having bugs where
> they didn't subtract from the packet's TTL.  I'm not the only one with these
> types of experiences.  Anything multicast is going to have a very large
> uphill battle for adoption because of these widespread problems.  Many
> network vendors really don't get this one right, for some reason.

Reply via email to