Hi, 
I'm doing stress and failover tests on a 3 node 0.8.1.1 kafka cluster and have 
the following observations.

A topic is created with 1 partition and 3 replications. request.required.acks 
is set to -1 for a sync producer. When the publishing speed is high (3M 
messages, each 2000 bytes, published in lists of size 2000), the two followers 
will fail out of sync. Only the leader remains in ISR. But the producer can 
keep sending. If the leader is killed with CTR_C, one follower will become 
leader, but message loss will happen because of the unclean leader election.

In the same test, request.required.acks=3 gives the desired result. Followers 
will fail out of sync, but the producer will be blocked untill all followers 
back to ISR. No data loss is observed in this case.

From the code, this turns out to be how it's designed:
if ((requiredAcks < 0 && numAcks >= inSyncReplicas.size) ||
(requiredAcks > 0 && numAcks >= requiredAcks)) {
/*
* requiredAcks < 0 means acknowledge after all replicas in ISR
* are fully caught up to the (local) leader's offset
* corresponding to this produce request.
*/
(true, ErrorMapping.NoError)
}

I'm wondering if it's more reasonable to let request.required.acks=-1 mean 
"receive acks from all replicas" instead of "receive acks from replicas in 
ISR"? As in the above test, follower will fail out sync under high publishing 
volume; that makes request.required.acks=-1 equivalent to 
request.required.acks=1. Since the kafka document states 
request.required.acks=-1 provides the best durability, one would expect it is 
equivalent to request.required.acks=number_of_replications.

Regards,
Jiang

Reply via email to