Justine Olshan created KAFKA-18575:
--------------------------------------

             Summary: Transaction Version 2 doesn't correctly handle race 
condition with completing and new transaction
                 Key: KAFKA-18575
                 URL: https://issues.apache.org/jira/browse/KAFKA-18575
             Project: Kafka
          Issue Type: Bug
            Reporter: Justine Olshan


Right now we have a check to figure out if we need to verify/add a partition 
and it involves checking if there is an ongoing transaction.

In the case where the previous transaction is in the process of 
committing/aborting, we could run into a scenario where we say a transaction is 
not ongoing for a given epoch so we need to add it to the coordinator. We add 
it to the queue to add to the transaction with the verification guard. When we 
get to the coordinator, the previous transaction has completed and we can add 
the partition. However, we still have the verification guard check at the log 
level right before the write, and that fails because completing the transaction 
clobbers the verification guard. I think what we need to do is just not have 
this second check at the log layer for TV2 and instead check the epoch is 
correct. 
(This was in the KIP but we didn’t quite implement it that way). The result is 
we self-fence in a scenario where we shouldn’t.

(This doesn’t happen with TV0 because we have to add the partition client side 
first and we hit all the concurrent transactions errors there first. We can 
only write and proceed to the produce message when the previous transaction is 
complete.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to