[
https://issues.apache.org/jira/browse/KAFKA-20000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045635#comment-18045635
]
Justine Olshan commented on KAFKA-20000:
----------------------------------------
Hey, I think this backoff is somewhat dependent on the system right? Depending
on how quickly inter-broker requests occur 20ms could be too frequent as well
right? For AddPartitionsToTxnHandler we had a config. Is this not working
correctly for offset commits?
>From KIP-890:
> Feb 2025. Adding additional configs to address a performance issue
> KAFKA-18654. During the transaction commit phase, it is normal to find
> CONCURRENT_TRANSACTION error when adding partitions to the transaction,
> because it takes some time for the markers to be fully propagated to all the
> data partitions. On the other hand, the client no longer sends the
> AddPartitionToTxn directly to the transaction coordinator, instead, the
> server sends the request as a part of the Produce/TxnOffsetCommit request
> handling. This new behavior causes the client to retry expensive produce
> requests during the transaction commit phase. So, we decided to let the
> server retry the AddPartitionToTxn when hitting the CONCURRENT_TRANSACTION.
> Then, the following 2 configs are used to control the retry.
* _add.partitions.to.txn.retry.backoff.max.ms_ defines the maximum retry
timeout when the server attempts to add the partition to the transaction.
* _add.partitions.to.txn.retry.backoff.ms_ defines how frequently the server
will retry the AddPartitionToTxn.
> Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput
> ----------------------------------------------------------------------------
>
> Key: KAFKA-20000
> URL: https://issues.apache.org/jira/browse/KAFKA-20000
> Project: Kafka
> Issue Type: Improvement
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Major
> Fix For: 4.3.0
>
>
> Transaction V2 introduces frequent state transitions (epoch bumps) that
> briefly reject concurrent requests with CONCURRENT_TRANSACTIONS. The default
> client retry backoff (100ms) is excessive for these transient locks, leading
> to unnecessary latency and degraded throughput. Reducing the backoff allows
> faster retries and smoother performance during state transitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)