Thanks!

I'm also trying to understand how replicas will catch up once the leader
goes down. Say, we have 3 brokers with IDs 1, 2, 3. The leader is broker 1.
Followers are 2 and 3. Consider the following scenario assuming that all
messages fall into the same partition:

1. Producer sends message A to the leader.
2. Leader stores the message, followers fetch it. Everyone's in sync.
3. Producer sends message B to the leader. Followers haven't still fetched
the message and lag by 1 message: the B is still only on the broker 1.
4. I bring the leader down.
5. Followers cannot fetch B anymore, since its only owner is down. Yet some
of the replicas needs to take over the leader responsibility. Say, broker 2
now becomes the leader, 3 is the follower.
6. Producer sends message C to the leader (broker 2). Follower fetches it.

I don't quite understand the state of the log on replicas 2 and 3 after
step#6. It looks like the log will have a gap in it. The expected log state
is ["A", "B", "C"]. But brokers 2 and 3 didn't have a chance to fetch "B",
so their log looks like ["A", "C"]. Will Kafka try to fill the gap in the
background once broker 1 started over?


2014-06-18 19:59 GMT+04:00 Neha Narkhede <neha.narkh...@gmail.com>:

> You don't gain much by running #4 between broker bounces. Running it after
> the cluster is upgraded will be sufficient.
>
> Thanks,
> Neha
>
>
> On Wed, Jun 18, 2014 at 8:33 AM, Yury Ruchin <yuri.ruc...@gmail.com>
> wrote:
>
> > Hi folks,
> >
> > In my project, we want to perform to update our active Kafka 0.8 cluster
> to
> > Kafka 0.8.1.1 without downtime and losing any data. The process (after
> > reading http://kafka.apache.org/documentation.html#upgrade) looks to me
> > like this. For each broker in turn:
> >
> > 1. Bring the broker down.
> > 2. Update Kafka to 0.8.1.1 on the broker node.
> > 3. Start the broker.
> > 4. Run preferred-replica-election script to restore broker's leadership
> for
> > respective partitions.
> > 5. Wait for the the preferred replica election to complete.
> >
> > I deem step#5 necessary since preferred replica election is an
> asynchronous
> > process. There is a slim chance that bringing other brokers down before
> the
> > election is complete would result in all replicas down for some
> partitions,
> > so a portion of the incoming data stream would be lost. Is my
> understanding
> > of the process correct?
> >
>

Reply via email to