[ https://issues.apache.org/jira/browse/KAFKA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-14417. ------------------------------------- Fix Version/s: 4.0.0 3.3.2 Resolution: Fixed > Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats > as fatal error > ------------------------------------------------------------------------------------------ > > Key: KAFKA-14417 > URL: https://issues.apache.org/jira/browse/KAFKA-14417 > Project: Kafka > Issue Type: Task > Affects Versions: 3.1.0, 3.0.0, 3.2.0, 3.3.0, 3.4.0 > Reporter: Justine Olshan > Assignee: Justine Olshan > Priority: Blocker > Fix For: 4.0.0, 3.3.2 > > > In TransactionManager we have a handler for InitProducerIdRequests > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14] > However, we have the potential to return a REQUEST_TIMED_OUT error in > RPCProducerIdManager when the BrokerToControllerChannel manager times out: > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236] > > or when the poll returns null: > [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170] > Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a > fatal error and the producer fails. With the default of idempotent producers, > this can cause more issues. > See this stack trace from 3.0: > {code:java} > ERROR [Producer clientId=console-producer] Aborting producer batches due to > fatal error (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.KafkaException: Unexpected error in > InitProducerIdResponse; The request timed out. > at > org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390) > at > org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650) > at > org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418) > at > org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256) > at java.lang.Thread.run(Thread.java:748) > {code} > Seems like the commit that introduced the changes was this one: > [https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958] > so we are vulnerable when the server code is ibp 3.0 and beyond. > -- This message was sent by Atlassian Jira (v8.20.10#820010)