Our current version, 3.3.4 is from 2011. There have been lots of
improvements since.

Things I'm mostly interested in:

1. Since both Hadoop distributions (CDH + HDP) ship with ZK 3.4.5 (and
have been for over two years), there are huge number of installations
running Kafka with 3.4.X ZK. 3.3.X client with 3.4.X server was never
adequately tested. So we are concerned about the stability and
supportability of this combination.

2. Hundreds of bug fixes introduced in 3 years...

3. The Zookeeper GUI is nice too :)

Thanks for the pointers around the separate ZK cluster. Our goal is
for ZK to never go down, but I agree that introducing dependencies
between services is never a good idea.

Gwen


On Mon, Aug 4, 2014 at 1:00 PM, Joe Stein <joe.st...@stealth.ly> wrote:
> If Kafka installations are missing something(s) by not having or using the
> latest Zookeeper from a feature or stability perspective that would be
> something to understand maybe you could help with that Gwen?
>
> I know one of the implementations used this Hadoop version
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.3-product.html
> which appears to be using Zk 3.4.5.  I will have to check on the other two
> (someone reminded me we saw this more than twice after I sent the email).
>  I think maybe one of them was CDH but don't recall off the top of my head
> it was a while ago.
>
> A reason why another zookeeper cluster for Kafka vs other software systems
> (Hadoop, Mesos, etc) is to separate risk of dependent services. One
> zookeeper cluster can now take down more systems when it goes down (for
> whatever reason, rogue server/code, upgrade, whatever) and becomes one big
> single point of failure for everything.  If you aren't using zookeeper for
> anything else that is mission critical it might not matter, it is relative
> (and have seen this too of course).
>
> We have also found deploying zookeeper to Mesos very (very (very)))
> fruitful for dealing with and managing multiple zookeeper ensembles without
> any headaches.... of course you can't do that with the Zookeeper ensemble
> for Mesos but that goes back to my separation.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Mon, Aug 4, 2014 at 12:36 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
>
>> Also, specific Zookeeper 3.4.X version where loss of quorum occurred will
>> help.
>> 3.4.5 fixed some pretty serious issues around hanging.
>>
>> Gwen
>>
>> On Mon, Aug 4, 2014 at 9:29 AM, Gwen Shapira <gshap...@cloudera.com>
>> wrote:
>> > Thanks for the heads-up, Joe.
>> >
>> > We've been shipping Zookeeper 3.4.X for over  two years now (since
>> > CDH4.0) and have many production customers. I'll check if there are
>> > any known issues with breaking quorum. In any case I will take your
>> > comments into account and see if I can arrange for extra testing.
>> >
>> > Can you share more information about the 3.4.X issues you were seeing?
>> > Was there especially large clusters involved? large number of
>> > consumers?
>> >
>> > Also, I'm curious to hear more about the reasons for separate ZK
>> > cluster. I can see why you'll want it if you have thousands of
>> > consumers, but are there other reasons? Multiple zookeeper installs
>> > can be a pain to manage.
>> >
>> > Gwen
>> >
>> >
>> >
>> > On Mon, Aug 4, 2014 at 7:52 AM, Joe Stein <joe.st...@stealth.ly> wrote:
>> >> I have heard issues from installations running 3.4.X that I have not
>> heard
>> >> from installations running 3.3.X (i.e. zk breaking quorum and cluster
>> going
>> >> down).
>> >>
>> >> In none of these cases did I have an opportunity to isolate and
>> reproduce
>> >> and confirm the issue happening and caused by 3.4.X. Moving to 3.3.x was
>> >> agreed to being a lower risk/cost solution to the problem. Once on 3.3.X
>> >> the issues didn't happen again.
>> >>
>> >> So I can't say for sure if there are issues with running 3.4.X but I
>> would
>> >> suggest some due diligence in testing and production operation to
>> validate
>> >> that every case that Kafka requires operates correctly (and over some
>> >> time).  There is a cost to this so some company(s) will have to take
>> that
>> >> investment and do some cost vs the benefit of moving to 3.4.x.
>> >>
>> >> I currently recommend running a separate ZK cluster for Kafka production
>> >> and not chroot into an existing one except for test/qa/dev.
>> >>
>> >> I don't know what others experience is with 3.4.X as I said the issues I
>> >> have seen could have been coincidence.
>> >>
>> >> /*******************************************
>> >>  Joe Stein
>> >>  Founder, Principal Consultant
>> >>  Big Data Open Source Security LLC
>> >>  http://www.stealth.ly
>> >>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> >> ********************************************/
>> >>
>> >>
>> >> On Mon, Aug 4, 2014 at 12:56 AM, Gwen Shapira <gshap...@cloudera.com>
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Kafka currently builds against Zookeeper 3.3.4, which is quite old.
>> >>>
>> >>> Perhaps we should move to the more recent 3.4.x branch?
>> >>>
>> >>> I tested the change on my system and the only impact is to
>> >>> EmbeddedZookeeper used in tests (it uses NIOServerCnxn.factory, which
>> >>> was refactored into its own class in 3.4).
>> >>>
>> >>> Here's what the change looks like:
>> >>> https://gist.github.com/gwenshap/d95b36e0bced53cab5bb
>> >>>
>> >>> Gwen
>> >>>
>>

Reply via email to