Justine Olshan created KAFKA-18654:
--------------------------------------

             Summary: Transaction Version 2 performance regression due to early 
return
                 Key: KAFKA-18654
                 URL: https://issues.apache.org/jira/browse/KAFKA-18654
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 4.0.0
            Reporter: Justine Olshan
            Assignee: Justine Olshan


https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race 
condition by returning with CONCURRENT_TRANSACTIONS early when the transaction 
was still completing. 
In testing, it was discovered that this early return could cause performance 
regressions.

Prior to KIP-890 the addpartitions call was a separate call from the producer. 
There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477 
that decreased the retry backoff. With KIP-890 and making the call through the 
produce path, we go back to the default retry backoff which takes longer. Prior 
to 18575 we introduce a slight delay when sending to the coordinator, so prior 
to 18575, we are less likely to return quickly and get stuck in this backoff. 



There are two ways to address this regression:
1. Solve 18575 via the other proposed solution for that ticket, don't return 
early and check the epoch to avoid the verification guard race
2. With the bumped produce version, return concurrent transactions and change 
produce handling to have a shorter backoff for this error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to