[ 
https://issues.apache.org/jira/browse/KAFKA-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-18575.
------------------------------------
    Resolution: Fixed

> Transaction Version 2 doesn't correctly handle race condition with completing 
> and new transaction
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-18575
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18575
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Justine Olshan
>            Assignee: Justine Olshan
>            Priority: Blocker
>
> Right now we have a check to figure out if we need to verify/add a partition 
> and it involves checking if there is an ongoing transaction.
> In the case where the previous transaction is in the process of 
> committing/aborting, we could run into a scenario where we say a transaction 
> is not ongoing for a given epoch so we need to add it to the coordinator. We 
> add it to the queue to add to the transaction with the verification guard. 
> When we get to the coordinator, the previous transaction has completed and we 
> can add the partition. However, we still have the verification guard check at 
> the log level right before the write, and that fails because completing the 
> transaction clobbers the verification guard. I think what we need to do is 
> just not have this second check at the log layer for TV2 and instead check 
> the epoch is correct. 
> (This was in the KIP but we didn’t quite implement it that way). The result 
> is we self-fence in a scenario where we shouldn’t.
> (This doesn’t happen with TV0 because we have to add the partition client 
> side first and we hit all the concurrent transactions errors there first. We 
> can only write and proceed to the produce message when the previous 
> transaction is complete.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to