[jira] [Commented] (KAFKA-20000) Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput

Justine Olshan (Jira) Tue, 16 Dec 2025 14:03:10 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-20000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045635#comment-18045635
 ]


Justine Olshan commented on KAFKA-20000:
----------------------------------------

Hey, I think this backoff is somewhat dependent on the system right? Depending 
on how quickly inter-broker requests occur 20ms could be too frequent as well 
right? For AddPartitionsToTxnHandler we had a config. Is this not working 
correctly for offset commits?

>From KIP-890:

> Feb 2025. Adding additional configs to address a performance issue 
> KAFKA-18654. During the transaction commit phase, it is normal to find 
> CONCURRENT_TRANSACTION error when adding partitions to the transaction, 
> because it takes some time for the markers to be fully propagated to all the 
> data partitions. On the other hand, the client no longer sends the 
> AddPartitionToTxn directly to the transaction coordinator, instead, the 
> server sends the request as a part of the Produce/TxnOffsetCommit request 
> handling. This new behavior causes the client to retry expensive produce 
> requests during the transaction commit phase. So, we decided to let the 
> server retry the AddPartitionToTxn when hitting the CONCURRENT_TRANSACTION. 
> Then, the following 2 configs are used to control the retry.
 * _add.partitions.to.txn.retry.backoff.max.ms_ defines the maximum retry 
timeout when the server attempts to add the partition to the transaction.
 * _add.partitions.to.txn.retry.backoff.ms_ defines how frequently the server 
will retry the AddPartitionToTxn.

> Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-20000
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20000
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>             Fix For: 4.3.0
>
>
> Transaction V2 introduces frequent state transitions (epoch bumps) that 
> briefly reject concurrent requests with CONCURRENT_TRANSACTIONS. The default 
> client retry backoff (100ms) is excessive for these transient locks, leading 
> to unnecessary latency and degraded throughput. Reducing the backoff allows 
> faster retries and smoother performance during state transitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-20000) Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput

Reply via email to