[jira] [Created] (KAFKA-13574) NotLeaderOrFollowerException thrown for a successful send

Kyle Kingsbury (Jira) Fri, 31 Dec 2021 06:58:06 -0800

Kyle Kingsbury created KAFKA-13574:
--------------------------------------

             Summary: NotLeaderOrFollowerException thrown for a successful send
                 Key: KAFKA-13574
                 URL: https://issues.apache.org/jira/browse/KAFKA-13574
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 3.0.0
         Environment: openjdk version "11.0.13" 2021-10-19
            Reporter: Kyle Kingsbury



With org.apache.kafka/kafka-clients 3.0.0, under rare circumstances involving 
multiple node and network failures, I've observed a call to `producer.send()` 
throw `NotLeaderOrFollowerException` for a message which later appears in 
`consumer.poll()` return values.

I don't have a reliable repro case for this yet, but the case I hit involved 
retries=1000, acks=all, and idempotence enabled. I suspect what might be 
happening here is that an initial attempt to send the message makes it to the 
server and is committed, but the acknowledgement is lost e.g. due to timeout; 
the Kafka producer then automatically retries the send attempt, and on that 
retry hits a NotLeaderOrFollowerException, which is thrown back to the caller. 
If we interpret NotLeaderOrFollowerException as a definite failure, then this 
would constitute an aborted read.

I've seen issues like this in a number of databases around client or 
server-internal retry mechanisms, and I think the thing to do is: rather than 
throwing the most *recent* error, throw the {*}most indefinite{*}. That way 
clients know that their request may have actually succeeded, and they won't 
(e.g.) attempt to re-submit a non-idempotent request again.

As a side note: is there... perhaps documentation on which errors in Kafka are 
supposed to be definite vs indefinite? NotLeaderOrFollowerException is a 
subclass of RetriableException, but it looks like RetriableException is more 
about transient vs permanent errors than whether it's safe to retry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (KAFKA-13574) NotLeaderOrFollowerException thrown for a successful send

Reply via email to