Re: Kafka availability guarantee

todd Sun, 11 Oct 2015 14:10:53 -0700

You can enable unclean leader election, which would allow the lagging partition 
to still become leader. There would be some data loss (offsets between the 
leggy partition and the old leader) but the partition would stay online and 
available.

Sent from my BlackBerry 10 smartphone on the TELUS network.
  Original Message  
From: Elias Levy
Sent: Sunday, October 11, 2015 5:00 PM
To: users@kafka.apache.org
Reply To: users@kafka.apache.org
Subject: Kafka availability guarantee

Reading through the Kafka documentation for statements regarding Kafka's
availability guarantees one comes across this statement:

*With this ISR model and f+1 replicas, a Kafka topic can tolerate f
failures without losing committed messages.*

In my opinion, this appears incorrect or at best misleading. Consider a
partition with a replication factor of 3. If one of the replicas lags, but
does not fail, the ISR will be shrank to a set of 2 replicas, the leader
and and one follower. The leader will consider the message committed when
itself and the in sync follower write the message to their respective
logs. Where a concurrent failure of 2 nodes occur, specifically the
failure of the leader and the in sync follower, there won't be any
remaining in sync replicas to take over as leader without potential message
loss. Therefore Kafka cannot tolerate any failure of *f* nodes, where *f*
is N - 1 and N is the replication factor. Kafka can only tolerate a failure
of *f* if we take N to be the ISR set size, which is a dynamic value and
not a topic configuration parameter that can me set a priori. Kafka can
tolerate some failures of *f* replicas when N is the replication factor, so
long as at least one in sync replica survives, but it can't tolerate all
such failures.

Am I wrong?

Re: Kafka availability guarantee

Reply via email to