[jira] [Commented] (KAFKA-20000) Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput

Chia-Ping Tsai (Jira) Tue, 16 Dec 2025 19:12:07 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-20000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045713#comment-18045713
 ]


Chia-Ping Tsai commented on KAFKA-20000:
----------------------------------------

{quote}
AddPartitionsToTxnHandler we had a config. Is this not working correctly for 
offset commits?
{quote}

Configs like `add.partitions.to.txn.retry.backoff.max.ms` and 
`add.partitions.to.txn.retry.backoff.ms` apply specifically to the produce path 
(KafkaApis#handleProduceRequest -> ReplicaManager.handleProduceAppend).

Unfortunately, the path handling 
`TxnOffsetCommitRequest`(KafkaApis#handleTxnOffsetCommitRequest -> 
ReplicaManager#maybeSendPartitionToTransactionCoordinator), which returns the 
concurrent transaction error to the client, lacks similar configurability

{quote}
Depending on how quickly inter-broker requests occur 20ms could be too frequent 
as well right?
{quote}

How about introducing new configurations for `TxnOffsetCommitHandler`, such as 
add.offsets.to.txn.retry.backoff.ms and 
add.offsets.to.txn.retry.backoff.max.ms? Reusing the generic `retry.backoff.ms` 
is too coarse-grained to control this behaviour individually

> Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-20000
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20000
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>             Fix For: 4.3.0
>
>
> Transaction V2 introduces frequent state transitions (epoch bumps) that 
> briefly reject concurrent requests with CONCURRENT_TRANSACTIONS. The default 
> client retry backoff (100ms) is excessive for these transient locks, leading 
> to unnecessary latency and degraded throughput. Reducing the backoff allows 
> faster retries and smoother performance during state transitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-20000) Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput

Reply via email to