Reading through the Kafka documentation for statements regarding Kafka's availability guarantees one comes across this statement:
*With this ISR model and f+1 replicas, a Kafka topic can tolerate f failures without losing committed messages.* In my opinion, this appears incorrect or at best misleading. Consider a partition with a replication factor of 3. If one of the replicas lags, but does not fail, the ISR will be shrank to a set of 2 replicas, the leader and and one follower. The leader will consider the message committed when itself and the in sync follower write the message to their respective logs. Where a concurrent failure of 2 nodes occur, specifically the failure of the leader and the in sync follower, there won't be any remaining in sync replicas to take over as leader without potential message loss. Therefore Kafka cannot tolerate any failure of *f* nodes, where *f* is N - 1 and N is the replication factor. Kafka can only tolerate a failure of *f* if we take N to be the ISR set size, which is a dynamic value and not a topic configuration parameter that can me set a priori. Kafka can tolerate some failures of *f* replicas when N is the replication factor, so long as at least one in sync replica survives, but it can't tolerate all such failures. Am I wrong?