Sanskar Jhajharia created KAFKA-19840:
-----------------------------------------

             Summary: Flaky test ShareConsumerTest. 
testShareGroupMaxSizeConfigExceeded
                 Key: KAFKA-19840
                 URL: https://issues.apache.org/jira/browse/KAFKA-19840
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Sanskar Jhajharia


The test has been observed as flaky in the recent runs: 
[https://develocity.apache.org/scans/tests?search.names=CI%20workflow%2CGit%20repository&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=github%2Ctrunk&search.tasks=test&search.timeZoneId=Asia%2FCalcutta&search.values=CI%2Chttps:%2F%2Fgithub.com%2Fapache%2Fkafka&tests.container=org.apache.kafka.clients.consumer.ShareConsumerTest&tests.sortField=FLAKY&tests.test=testShareGroupMaxSizeConfigExceeded()%5B1%5D]
 

 
After tracing the failures, the root cause was narrowed down to commit 
[https://github.com/apache/kafka/commit/87657fdfc721055835f5b1f22151c461e85eab4a].
 This change introduced a new try/catch around 
{{handleCompletedAcknowledgements()}} in {{{}ShareConsumerImpl.java{}}}. The 
side effect is that all exceptions coming from the commit callback — including 
GroupMaxSizeReachedException — are now swallowed and only logged, preventing 
the test from ever receiving the exception.I synced with Shiv and we think that 
the resulting flakiness is timing dependent: * If the ack callback fires while 
{{poll()}} is executing → exception is caught & swallowed → test times out
 * If the callback fires outside that path → exception escapes → test passes

So the same test randomly passes/fails depending on scheduling of the ack 
callback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to