Travis Bischel created KAFKA-14315:
--------------------------------------

             Summary: Kraft: 1 broker setup, broker took 34 seconds to 
transition from PrepareCommit to CompleteCommit
                 Key: KAFKA-14315
                 URL: https://issues.apache.org/jira/browse/KAFKA-14315
             Project: Kafka
          Issue Type: Bug
          Components: kraft
            Reporter: Travis Bischel


I'm still looking into a PR failure in [my 
client|https://github.com/twmb/franz-go/pull/223] and noticed something a bit 
strange. I know that _technically_ I should be using RequireStableFetchOffsets 
in my transaction tests to prevent rebalances while a transaction is not 
finalized. I'll be adding that.

However, these tests have never failed against zookeeper mode. The client goes 
through a lot of efforts to avoid needing KIP-447 behavior, and the assumption 
with localhost testing is that things run fast enough (and that there are 
enough guards) that problems would not be encountered.

That looks to not be true with a kraft broker, but looking at 
__transaction_state, the following looks to be especially problematic:

 

{{__transaction_state partition 33 offset 7 at [2022-10-18 11:15:37.821]}}
{{TxnMetadataKey(0) 
9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{      ProducerID           41}}
{{      ProducerEpoch        0}}
{{      TimeoutMillis        120000}}
{{      State                PrepareCommit}}
{{      Topics               __consumer_offsets=>[13] 
e7c7d971626fbaf4bfb33975e57089167939e6acabb4c4fc534eb148462e45cc=>[4 5 12 16]  
}}
{{      LastUpdateTimestamp  1666113337821}}
{{      StartTimestamp       1666113335311}}
{{__transaction_state partition 33 offset 8 at [2022-10-18 11:16:11.419]}}
{{TxnMetadataKey(0) 
9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{      ProducerID           41}}
{{      ProducerEpoch        0}}
{{      TimeoutMillis        120000}}
{{      State                CompleteCommit}}
{{      Topics     }}
{{      LastUpdateTimestamp  1666113337821}}
{{      StartTimestamp       1666113335311}}

 

I've captured that using my kcl tool.

Note that the transaction enters PrepareCommit at 11:15:37.821, and then enters 
CompleteCommit at 11:16:11.419. AFAICT, this means that in my single node kraft 
setup, the broker took 34 seconds to transition commit states internally.

I noticed this in tests because a rebalance happened between those 34 seconds, 
which caused duplicate consumption because transactional offset commits were 
not finalized and the old commits were picked up.

This ticket is related to KAFKA-14312, in that this failure is cropping up as 
I've worked around KAFKA-14312 within the client itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to