Calvin Liu created KAFKA-17877: ---------------------------------- Summary: IllegalStateException: missing producer id from the WriteTxnMarkersResponse Key: KAFKA-17877 URL: https://issues.apache.org/jira/browse/KAFKA-17877 Project: Kafka Issue Type: Bug Reporter: Calvin Liu Assignee: Calvin Liu
{code:java} java.lang.IllegalStateException: WriteTxnMarkerResponse for lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does not contain expected error map for producer id 8308 {code} [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100] ------ It is a data partition side bug. The leader may return the response early without all the producer ID included in the response. Consider the following case: # We have 2 markers to append, one for producer-0, one for producer-1 # When we first process producer-0, it appends a marker to the __consumer_offset. # The __consumer_offset append finishes very fast because the group coordinator is no longer the leader. So the coordinator directly returns NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for the first time, and because there is only one partition to append, it is able to go further to call {{maybeSendResponseCallback()}} and decrement {{{}numAppends{}}}. # Then it calls the replica manager append for nothing, in the callback, it calls the {{maybeComplete()}} for the second time. This time, it also decrements {{{}numAppends{}}}. Remember, because we only have 2 markers, the initial value for {{numAppends}} is also 2. So in step 4, it is able to finish the request without even processing producer-1. This will cause the producer-1 missing from the WriteTxnMarkers response. -- This message was sent by Atlassian Jira (v8.20.10#820010)