thanks Joel for looking into it. I will try to reproduce it. I don't think
the second zookeeper is needed because i ran into it the first time just by
shutting down the topic leaders.

Cal


On Tue, Jul 16, 2013 at 2:38 AM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Hey Calvin,
>
> I apologize for not being able to get to this sooner. I don't think I
> can reproduce the full scenario exactly as I don't have exclusive
> access to so many machines, but I tried it locally and couldn't
> reproduce it. Any chance you can reproduce it with a smaller
> deployment? Is step 6 required? Would you mind pasting the full stack
> trace that you saw?
>
> Thanks,
>
> Joel
>
>
>
>
> On Wed, Jul 10, 2013 at 11:10 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
> > Ok thanks - I'll go through this tomorrow.
> >
> > Joel
> >
> > On Wed, Jul 10, 2013 at 9:14 PM, Calvin Lei <ckp...@gmail.com> wrote:
> >> Joel,
> >>    So i was able to reproduce the issue that I experienced. Please see
> the
> >> steps below.
> >> 1. Set up a 3-zookeeper and 6-broker cluster. Setup one topic with 2
> >> partitions, with replication factor set to 3.
> >> 2. Setup and run the console consumer, consuming messages from that
> topic.
> >> 3. Produce a few messages to confirm the consumer is working.
> >> 4. Stop the consumer.
> >> 5. Shutdown (uncontrolled) the lead broker in one of the partition.
> >> 6. Shutdown one of the zookeeper.
> >> 7. Run the list topic script to confirm a new leader has been elected
> >> 8. Bring up the console consumer again.
> >> 9. Console consumer won't start because of error in rebalancing (when
> >> fetching topic metadata).
> >>      Error: Java.util.NoSuchElementException: Key Not Found (5).
> >>      Trace: Client.Util.Scala:67
> >>
> >> Where broker 5 was the lead broker I shut down. I am using 0.8 beta.
> >>
> >> thanks,
> >> Cal
> >>
> >>
> >> On Tue, Jul 9, 2013 at 11:20 PM, Calvin Lei <ckp...@gmail.com> wrote:
> >>
> >>> I will try to reproduce it. it was sporadic. My set up was a topic
> with 1
> >>> partition and replication factor = 3.
> >>> If i kill the console producer and then shut down the leader broker, a
> new
> >>> leader is elected. If I again kill the new lead, I dont see the last
> broker
> >>> be elected as a leader. Then i tried starting the console producer, i
> >>> started seeing errors.
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 9, 2013 at 6:14 PM, Joel Koshy <jjkosh...@gmail.com>
> wrote:
> >>>
> >>>> Not really - if you shutdown a leader broker (and assuming your
> >>>> replication factor is > 1) then the other assigned replica will be
> >>>> elected as the new leader. The producer would then look up metadata,
> >>>> find the new leader and send requests to it. What do you see in the
> >>>> logs?
> >>>>
> >>>> Joel
> >>>>
> >>>> On Tue, Jul 9, 2013 at 1:44 PM, Calvin Lei <ckp...@gmail.com> wrote:
> >>>> > Thanks you have me enough pointers to dig deeper. And I tested the
> fault
> >>>> > tolerance by shutting down brokers randomly.
> >>>> >
> >>>> > What I noticed is if I shutdown brokers while my producer and
> consumer
> >>>> are
> >>>> > still running, they recover fine. However, if I shutdown a lead
> broker
> >>>> > without a running producer, I can't seem to start the producer
> >>>> afterwards
> >>>> > without restarting the previous lead broker. Is this expected?
> >>>> > On Jul 9, 2013 10:28 AM, "Joel Koshy" <jjkosh...@gmail.com> wrote:
> >>>> >
> >>>> >> For 1 I forgot to add - there is an admin tool to reassign replicas
> >>>> but it
> >>>> >> would take longer than leader failover.
> >>>> >>
> >>>> >> Joel
> >>>> >>
> >>>> >> On Tuesday, July 9, 2013, Joel Koshy wrote:
> >>>> >>
> >>>> >> > 1 - no, unless broker4 is not the preferred leader. (The
> preferred
> >>>> >> > leader is the first broker in the assigned replica list). If a
> >>>> >> > non-preferred replica is the current leader you can run the
> >>>> >> > PreferredReplicaLeaderElection admin command to move the leader.
> >>>> >> > 2 - The actual leader movement (on leader failover) is fairly
> low -
> >>>> >> > probably of the order of tens of ms. However, clients (producers,
> >>>> >> > consumers) may take longer to detect that (it needs to get back
> an
> >>>> >> > error response, handle an exception, issue a metadata request,
> get
> >>>> the
> >>>> >> > response to find the new leader, and all that can add up but it
> >>>> should
> >>>> >> > not be terribly high - I'm guessing on the order of a few
> hundred ms
> >>>> >> > to a second or so).
> >>>> >> > 3 - That should work, although the admin command for adding more
> >>>> >> > partitions to a topic is currently being developed.
> >>>> >> >
> >>>> >> >
> >>>> >> > On Mon, Jul 8, 2013 at 11:02 PM, Calvin Lei <ckp...@gmail.com>
> >>>> wrote:
> >>>> >> > > Hi,
> >>>> >> > >     I have two questions regarding the kafka broker setup.
> >>>> >> > >
> >>>> >> > > 1. Assuming i have a 4-broker and 2-zookeeper (running in
> quorum
> >>>> mode)
> >>>> >> > > setup, if topicA-partition0 has the leader set to broker4, can
> I
> >>>> change
> >>>> >> > the
> >>>> >> > > leader to other broker without killing the current leader?
> >>>> >> > >
> >>>> >> > > 2. What is the latency of switching to a different leader when
> the
> >>>> >> > current
> >>>> >> > > leader is down? Do we configure it using the consumer property
> -
> >>>> >> > > refresh.leader.backoff.ms
> >>>> >> > >
> >>>> >> > > 3. What is the best practice of dynamically adding a new node
> to a
> >>>> >> kafka
> >>>> >> > > cluster? Should i bring up the node, and then increase the
> >>>> replication
> >>>> >> > > factor for the existing topic(s)?
> >>>> >> > >
> >>>> >> > >
> >>>> >> > > thanks in advance,
> >>>> >> > > Cal
> >>>> >> >
> >>>> >>
> >>>>
> >>>
> >>>
>

Reply via email to