[
https://issues.apache.org/jira/browse/KAFKA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justine Olshan updated KAFKA-14417:
-----------------------------------
Description:
In TransactionManager we have a handler for InitProducerIdRequests
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14]
However, we have the potential to return a REQUEST_TIMED_OUT error in
RPCProducerIdManager when the BrokerToControllerChannel manager times out:
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236]
or when the poll returns null:
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170]
Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a fatal
error and the producer fails. With the default of idempotent producers, this
can cause more issues.
See this stack trace from 3.0:
ERROR [Producer clientId=console-producer] Aborting producer batches due to
fatal error (org.apache.kafka.clients.producer.internals.Sender)
org.apache.kafka.common.KafkaException: Unexpected error in
InitProducerIdResponse; The request timed out.
at
org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390)
at
org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294)
at
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650)
at
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418)
at
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)
at java.lang.Thread.run(Thread.java:748)
Seems like the commit that introduced the changes was this one:
[https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958]
so we are vulnerable when the server code is ibp 3.0 and beyond.
was:
In TransactionManager we have a handler for InitProducerIdRequests
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14]
However, we have the potential to return a REQUEST_TIMED_OUT error in
RPCProducerIdManager when the BrokerToControllerChannel manager times out:
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236]
or when the poll returns null:
[https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170]
Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a fatal
error and the producer fails. With the default of idempotent producers, this
can cause more issues.
See this stack trace from 3.0:
{{ERROR [Producer clientId=console-producer] Aborting producer batches due to
fatal error (org.apache.kafka.clients.producer.internals.Sender)}}
{{org.apache.kafka.common.KafkaException: Unexpected error in
InitProducerIdResponse; The request timed out.}}
{{at
org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390)}}
{{at
org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294)}}
{{at
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)}}
{{at
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658)}}
{{at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650)}}
{{at
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418)}}
{{at
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)}}
{{at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)}}
{{at java.lang.Thread.run(Thread.java:748)}}
Seems like the commit that introduced the changes was this one:
[https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958]
so we are vulnerable when the server code is ibp 3.0 and beyond.
> Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats
> as fatal error
> ------------------------------------------------------------------------------------------
>
> Key: KAFKA-14417
> URL: https://issues.apache.org/jira/browse/KAFKA-14417
> Project: Kafka
> Issue Type: Task
> Affects Versions: 3.1.0, 3.0.0, 3.2.0, 3.3.0
> Reporter: Justine Olshan
> Assignee: Justine Olshan
> Priority: Blocker
>
> In TransactionManager we have a handler for InitProducerIdRequests
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14]
> However, we have the potential to return a REQUEST_TIMED_OUT error in
> RPCProducerIdManager when the BrokerToControllerChannel manager times out:
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236]
>
> or when the poll returns null:
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170]
> Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a
> fatal error and the producer fails. With the default of idempotent producers,
> this can cause more issues.
> See this stack trace from 3.0:
> ERROR [Producer clientId=console-producer] Aborting producer batches due to
> fatal error (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Unexpected error in
> InitProducerIdResponse; The request timed out.
> at
> org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390)
> at
> org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294)
> at
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650)
> at
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418)
> at
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)
> at java.lang.Thread.run(Thread.java:748)
> Seems like the commit that introduced the changes was this one:
> [https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958]
> so we are vulnerable when the server code is ibp 3.0 and beyond.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)