Re: Rolling update Kafka 0.8 -> Kafka 0.8.1.1 in detail

Jun Rao Mon, 30 Jun 2014 09:07:35 -0700

If broker 1 is down in step 4, the producer will get a broken socket error
immediately. If broker 1 is up in step 4 and just the leader is moved
(e.g., due to preferred leader balancing), the producer will get an error
after the timeout specified in the producer request.


Thanks,

Jun


On Mon, Jun 30, 2014 at 7:40 AM, Yury Ruchin <yuri.ruc...@gmail.com> wrote:

> Hi Michal,
>
> Thanks for the perfect links. They really help. Now it looks like with
> request.required.acks=1 (let alone 0) messages can be lost in the case I
> described. The aphyr's article, seemingly, describes a more tricky case
> than I have.
>
> I'm still not sure on Kafka behavior in case of request.required.acks=-1.
> With this setting in effect, the scenario will turn into this:
>
> 1. Producer sends message A to the leader.
> 2. Leader stores the message, followers fetch it. Everyone's in sync.
> 3. Producer sends message B to the leader. Followers haven't still fetched
> the message and lag by 1 message: the B is still only on the broker 1. The
> produce request for message B sits in the purgatory waiting for
> acknowledgement from the followers.
> 4. The leader goes down.
> 5. Followers cannot fetch B anymore, since its only owner is down. Yet some
> of the replicas needs to take over the leader responsibility. Say, broker 2
> now becomes the leader, 3 is the follower.
> 6. Producer sends message C to the leader (broker 2). Follower fetches it.
> 7. Broker 1 goes back online and starts following broker 2.
>
> What happens with message B that was sitting in the 1's purgatory when it
> went down? Will an error "unable to send message B" be returned to the
> producer immediately after broker 1 shutdown?
>
>
> 2014-06-30 17:41 GMT+04:00 Michal Michalski <michal.michal...@boxever.com
> >:
>
> > Hi Yury,
> >
> > If I understand correctly, the case you're describing is equivalent to
> the
> > leader re-election (in terms of data consistency). In that case messages
> > can be lost depending on your "acks" setting:
> >
> > https://kafka.apache.org/documentation.html
> > see: request.required.acks:
> > E.g. "only messages that were written to the now-dead leader but not yet
> > replicated will be lost)." for acks=1
> >
> > More info on that:
> > http://aphyr.com/posts/293-call-me-maybe-kafka
> >
> > However, I'd be happy if someone with more Kafka experience confirmed my
> > understanding of that issue.
> >
> >
> > Kind regards,
> > Michał Michalski,
> > michal.michal...@boxever.com
> >
> >
> > On 30 June 2014 14:34, Yury Ruchin <yuri.ruc...@gmail.com> wrote:
> >
> > > Thanks!
> > >
> > > I'm also trying to understand how replicas will catch up once the
> leader
> > > goes down. Say, we have 3 brokers with IDs 1, 2, 3. The leader is
> broker
> > 1.
> > > Followers are 2 and 3. Consider the following scenario assuming that
> all
> > > messages fall into the same partition:
> > >
> > > 1. Producer sends message A to the leader.
> > > 2. Leader stores the message, followers fetch it. Everyone's in sync.
> > > 3. Producer sends message B to the leader. Followers haven't still
> > fetched
> > > the message and lag by 1 message: the B is still only on the broker 1.
> > > 4. I bring the leader down.
> > > 5. Followers cannot fetch B anymore, since its only owner is down. Yet
> > some
> > > of the replicas needs to take over the leader responsibility. Say,
> > broker 2
> > > now becomes the leader, 3 is the follower.
> > > 6. Producer sends message C to the leader (broker 2). Follower fetches
> > it.
> > >
> > > I don't quite understand the state of the log on replicas 2 and 3 after
> > > step#6. It looks like the log will have a gap in it. The expected log
> > state
> > > is ["A", "B", "C"]. But brokers 2 and 3 didn't have a chance to fetch
> > "B",
> > > so their log looks like ["A", "C"]. Will Kafka try to fill the gap in
> the
> > > background once broker 1 started over?
> > >
> > >
> > > 2014-06-18 19:59 GMT+04:00 Neha Narkhede <neha.narkh...@gmail.com>:
> > >
> > > > You don't gain much by running #4 between broker bounces. Running it
> > > after
> > > > the cluster is upgraded will be sufficient.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 8:33 AM, Yury Ruchin <yuri.ruc...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In my project, we want to perform to update our active Kafka 0.8
> > > cluster
> > > > to
> > > > > Kafka 0.8.1.1 without downtime and losing any data. The process
> > (after
> > > > > reading http://kafka.apache.org/documentation.html#upgrade) looks
> to
> > > me
> > > > > like this. For each broker in turn:
> > > > >
> > > > > 1. Bring the broker down.
> > > > > 2. Update Kafka to 0.8.1.1 on the broker node.
> > > > > 3. Start the broker.
> > > > > 4. Run preferred-replica-election script to restore broker's
> > leadership
> > > > for
> > > > > respective partitions.
> > > > > 5. Wait for the the preferred replica election to complete.
> > > > >
> > > > > I deem step#5 necessary since preferred replica election is an
> > > > asynchronous
> > > > > process. There is a slim chance that bringing other brokers down
> > before
> > > > the
> > > > > election is complete would result in all replicas down for some
> > > > partitions,
> > > > > so a portion of the incoming data stream would be lost. Is my
> > > > understanding
> > > > > of the process correct?
> > > > >
> > > >
> > >
> >
>

Re: Rolling update Kafka 0.8 -> Kafka 0.8.1.1 in detail

Reply via email to