[ https://issues.apache.org/jira/browse/KAFKA-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Jacot resolved KAFKA-17877. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed > IllegalStateException: missing producer id from the WriteTxnMarkersResponse > --------------------------------------------------------------------------- > > Key: KAFKA-17877 > URL: https://issues.apache.org/jira/browse/KAFKA-17877 > Project: Kafka > Issue Type: Bug > Reporter: Calvin Liu > Assignee: Calvin Liu > Priority: Major > Fix For: 4.0.0 > > > {code:java} > java.lang.IllegalStateException: WriteTxnMarkerResponse for > lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does > not contain expected error map for producer id 8308 > {code} > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100] > ------ > It is a data partition side bug. The leader may return the response early > without all the producer ID included in the response. > Consider the following 2 cases: > # We have 2 markers to append, one for producer-0, one for producer-1 > # When we first process producer-0, it appends a marker to the > __consumer_offset. > # The __consumer_offset append finishes very fast because the group > coordinator is no longer the leader. So the coordinator directly returns > NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for > the first time, and because there is only one partition to append, it is able > to go further to call {{maybeSendResponseCallback()}} and decrement > {{{}numAppends{}}}. > # Then it calls the replica manager append for nothing, in the callback, it > calls the {{maybeComplete()}} for the second time. This time, it also > decrements {{{}numAppends{}}}. > > # We have 2 markers to append, one for producer-0, one for producer-1 > # When we first process producer-0, it appends a marker to the > __consumer_offset and a data topic foo. > # The 2 appends will be handled by group coordinator and replica manager > asynchronously. > # It can be a race that, both appends finishes together, then they can fill > the {{markerResults}} at the same time, then call the {{{}maybeComplete{}}}. > Because the {{partitionsWithCompatibleMessageFormat.size == > markerResults.size}} condition is satisfied, both {{maybeComplete}} calls can > go through to decrement the {{numAppends}} and cause a premature response. > Remember, because we only have 2 markers, the initial value for > {{numAppends}} is also 2. So in step 4, it is able to finish the request > without even processing producer-1. This will cause the producer-1 missing > from the WriteTxnMarkers response. > ---- > As a result, the txn coordinator will not update the txn state correctly > though the markers may have been written in the data partitions. There is an > impact on the clients. the client believes the txn is completed but when it > tries to send any request for the new transaction with the same transaction > ID, the request will fail with CONCURRENT_TRANSACTIONS. > Note, this can only happen with the KIP-848 coordinator enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)