Justine Olshan created KAFKA-18575: -------------------------------------- Summary: Transaction Version 2 doesn't correctly handle race condition with completing and new transaction Key: KAFKA-18575 URL: https://issues.apache.org/jira/browse/KAFKA-18575 Project: Kafka Issue Type: Bug Reporter: Justine Olshan
Right now we have a check to figure out if we need to verify/add a partition and it involves checking if there is an ongoing transaction. In the case where the previous transaction is in the process of committing/aborting, we could run into a scenario where we say a transaction is not ongoing for a given epoch so we need to add it to the coordinator. We add it to the queue to add to the transaction with the verification guard. When we get to the coordinator, the previous transaction has completed and we can add the partition. However, we still have the verification guard check at the log level right before the write, and that fails because completing the transaction clobbers the verification guard. I think what we need to do is just not have this second check at the log layer for TV2 and instead check the epoch is correct. (This was in the KIP but we didn’t quite implement it that way). The result is we self-fence in a scenario where we shouldn’t. (This doesn’t happen with TV0 because we have to add the partition client side first and we hit all the concurrent transactions errors there first. We can only write and proceed to the produce message when the previous transaction is complete.) -- This message was sent by Atlassian Jira (v8.20.10#820010)