jolshan opened a new pull request, #18604:
URL: https://github.com/apache/kafka/pull/18604

   There is a subtle race condition with transactions V2 if a transaction is 
still completing when checking if we need to add a partition, but it completes 
when the request reaches the coordinator. 
   
   One approach was to remove the verification for TV2, but a simpler one is to 
simply return concurrent transactions from the partition leader (before 
attempting to add the partition). I've done this and added a test for this 
behavior. 
   
   Locally, I reproduced the race but adding a 1 second sleep when handling the 
WriteTxnMarkersRequest and a 2 second delay before adding the partition to the 
AddPartitionsToTxnManager. Without this change, the race happened on every 
second transaction as the first one completed. With this change, the error went 
away.
   
   As a followup, we may want to clean up some of the code and comments with 
respect to verification as the code is used by both TV0 + verification and TV2. 
But that doesn't need to complete for 4.0. This does :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to