Sanskar Jhajharia created KAFKA-19840:
-----------------------------------------
Summary: Flaky test ShareConsumerTest.
testShareGroupMaxSizeConfigExceeded
Key: KAFKA-19840
URL: https://issues.apache.org/jira/browse/KAFKA-19840
Project: Kafka
Issue Type: Sub-task
Reporter: Sanskar Jhajharia
The test has been observed as flaky in the recent runs:
[https://develocity.apache.org/scans/tests?search.names=CI%20workflow%2CGit%20repository&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=github%2Ctrunk&search.tasks=test&search.timeZoneId=Asia%2FCalcutta&search.values=CI%2Chttps:%2F%2Fgithub.com%2Fapache%2Fkafka&tests.container=org.apache.kafka.clients.consumer.ShareConsumerTest&tests.sortField=FLAKY&tests.test=testShareGroupMaxSizeConfigExceeded()%5B1%5D]
After tracing the failures, the root cause was narrowed down to commit
[https://github.com/apache/kafka/commit/87657fdfc721055835f5b1f22151c461e85eab4a].
This change introduced a new try/catch around
{{handleCompletedAcknowledgements()}} in {{{}ShareConsumerImpl.java{}}}. The
side effect is that all exceptions coming from the commit callback — including
GroupMaxSizeReachedException — are now swallowed and only logged, preventing
the test from ever receiving the exception.I synced with Shiv and we think that
the resulting flakiness is timing dependent: * If the ack callback fires while
{{poll()}} is executing → exception is caught & swallowed → test times out
* If the callback fires outside that path → exception escapes → test passes
So the same test randomly passes/fails depending on scheduling of the ack
callback.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)