Justine Olshan created KAFKA-18654: -------------------------------------- Summary: Transaction Version 2 performance regression due to early return Key: KAFKA-18654 URL: https://issues.apache.org/jira/browse/KAFKA-18654 Project: Kafka Issue Type: Bug Affects Versions: 4.0.0 Reporter: Justine Olshan Assignee: Justine Olshan
https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race condition by returning with CONCURRENT_TRANSACTIONS early when the transaction was still completing. In testing, it was discovered that this early return could cause performance regressions. Prior to KIP-890 the addpartitions call was a separate call from the producer. There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477 that decreased the retry backoff. With KIP-890 and making the call through the produce path, we go back to the default retry backoff which takes longer. Prior to 18575 we introduce a slight delay when sending to the coordinator, so prior to 18575, we are less likely to return quickly and get stuck in this backoff. There are two ways to address this regression: 1. Solve 18575 via the other proposed solution for that ticket, don't return early and check the epoch to avoid the verification guard race 2. With the bumped produce version, return concurrent transactions and change produce handling to have a shorter backoff for this error. -- This message was sent by Atlassian Jira (v8.20.10#820010)