Artem Livshits created KAFKA-19367:
--------------------------------------

             Summary: InitProducerId with TV2 double-increments epoch if 
ongoing transaction is aborted
                 Key: KAFKA-19367
                 URL: https://issues.apache.org/jira/browse/KAFKA-19367
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 4.0.0
            Reporter: Artem Livshits


When InitProducerId is handled on the transaction coordinator, the producer 
epoch is incremented (so that we fence stale requests), then if a transaction 
was ongoing during this time, it's aborted.  With transaction version 2 (a.k.a. 
KIP-890 part 2), abort increments the producer epoch again (it's the part of 
the new abort / commit protocol), so the epoch ends up incremented twice.

In most cases this is benign, but in the case when the epoch of the ongoing 
transaction is 32766, it's incremented to 32767 which is max value for short, 
and then when it's incremented for the second time, it goes negative, and 
causes illegal argument exception.
 # First increment happens 
[here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L100]
 # Second increment happens 
[here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L195]
 # Illegal argument exception happens 
[here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L289]
 

Most likely the solution would be to just not bump epoch when we reset the 
fenced state 
[here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionCoordinator.scala#L572].
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to