kafka-1028 addressed another unclean leader election problem. It prevents a 
broker not in ISR from becoming a leader. The problem we are facing is that a 
broker in ISR but without complete messages may become a leader. It's also a 
kind of unclean leader election, but not the one that kafka-1028 addressed.

Here I'm trying to give a proof that current kafka doesn't achieve the 
requirement (no message loss, no blocking when 1 broker down) due to its two 
behaviors:
1. when choosing a new leader from 2 followers in ISR, the one with less 
messages may be chosen as the leader
2. even when replica.lag.max.messages=0, a follower can stay in ISR when it has 
less messages than the leader.

We consider a cluster with 3 brokers and a topic with 3 replicas. We analyze 
different cases according to the value of request.required.acks (acks for 
short). For each case and it subcases, we find situations that either message 
loss or service blocking happens. We assume that at the beginning, all 3 
replicas, leader A, followers B and C, are in sync, i.e., they have the same 
messages and are all in ISR.

1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
time, although C hasn't received m, it's still in ISR. If A is killed, C can be 
elected as the new leader, and consumers will miss m.
3. acks=-1. Suppose replica.lag.max.messages=M. There are two sub-cases:
3.1 M>0. Suppose C be killed. C will be out of ISR after 
replica.lag.time.max.ms. Then the producer publishes M messages to A and B. C 
restarts. C will join in ISR since it is M messages behind A and B. Before C 
replicates all messages, A is killed, and C becomes leader, then message loss 
happens.
3.2 M=0. In this case, when the producer publishes at a high speed, B and C 
will fail out of ISR, only A keeps receiving messages. Then A is killed. Either 
message loss or service blocking will happen, depending on whether unclean 
leader election is enabled.


From: users@kafka.apache.org At: Jul 21 2014 22:28:18
To: JIANG WU (PRICEHISTORY) (BLOOMBERG/ 731 LEX -), users@kafka.apache.org
Subject: Re: how to ensure strong consistency with reasonable availability

You will probably need 0.8.2  which gives
https://issues.apache.org/jira/browse/KAFKA-1028


On Mon, Jul 21, 2014 at 6:37 PM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731
LEX -) <jwu...@bloomberg.net> wrote:

> Hi everyone,
>
> With a cluster of 3 brokers and a topic of 3 replicas, we want to achieve
> the following two properties:
> 1. when only one broker is down, there's no message loss, and
> procuders/consumers are not blocked.
> 2. in other more serious problems, for example, one broker is restarted
> twice in a short period or two brokers are down at the same time,
> producers/consumers can be blocked, but no message loss is allowed.
>
> We haven't found any producer/broker paramter combinations that achieve
> this. If you know or think some configurations will work, please post
> details. We have a test bed to verify any given configurations.
>
> In addition, I'm wondering if it's necessary to open a jira to require the
> above feature?
>
> Thanks,
> Jiang


Reply via email to