[ https://issues.apache.org/jira/browse/KAFKA-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justine Olshan resolved KAFKA-18575. ------------------------------------ Resolution: Fixed > Transaction Version 2 doesn't correctly handle race condition with completing > and new transaction > ------------------------------------------------------------------------------------------------- > > Key: KAFKA-18575 > URL: https://issues.apache.org/jira/browse/KAFKA-18575 > Project: Kafka > Issue Type: Bug > Reporter: Justine Olshan > Assignee: Justine Olshan > Priority: Blocker > > Right now we have a check to figure out if we need to verify/add a partition > and it involves checking if there is an ongoing transaction. > In the case where the previous transaction is in the process of > committing/aborting, we could run into a scenario where we say a transaction > is not ongoing for a given epoch so we need to add it to the coordinator. We > add it to the queue to add to the transaction with the verification guard. > When we get to the coordinator, the previous transaction has completed and we > can add the partition. However, we still have the verification guard check at > the log level right before the write, and that fails because completing the > transaction clobbers the verification guard. I think what we need to do is > just not have this second check at the log layer for TV2 and instead check > the epoch is correct. > (This was in the KIP but we didn’t quite implement it that way). The result > is we self-fence in a scenario where we shouldn’t. > (This doesn’t happen with TV0 because we have to add the partition client > side first and we hit all the concurrent transactions errors there first. We > can only write and proceed to the produce message when the previous > transaction is complete.) -- This message was sent by Atlassian Jira (v8.20.10#820010)