[ 
https://issues.apache.org/jira/browse/KAFKA-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881173#comment-17881173
 ] 
Sagar Rao commented on KAFKA-17493:
-----------------------------------

[~ChrisEgerton] , sorry my bad. Yes I do see that the ListOffsets call keeps 
returning empty offsets till the timeout happens. I grepped the Group 
Coordinator logs for the flaky and non flaky cases and what I notice is that in 
the flaky case, the consumer group of the sink task never got to 2 members in 
the group. These are the lines from the flaky test:
{code:java}
[2024-09-06 21:59:57,843] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Dynamic member with unknown member id joins group 
connect-testGetSinkConnectorOffsets in Empty state. Created a new member id 
connector-consumer-testGetSinkConnectorOffsets-0-a47aa5b3-d9d8-4aa6-ab30-7f79c971b6ee
 and requesting the member to rejoin with this id. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:4111)
[2024-09-06 21:59:57,844] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Dynamic member with unknown member id joins group 
connect-testGetSinkConnectorOffsets in Empty state. Created a new member id 
connector-consumer-testGetSinkConnectorOffsets-1-4c065deb-6771-427d-902b-be788543b7bd
 and requesting the member to rejoin with this id. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:4111)
[2024-09-06 21:59:57,844] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Preparing to rebalance group connect-testGetSinkConnectorOffsets 
in state PreparingRebalance with old generation 0 (reason: Adding new member 
connector-consumer-testGetSinkConnectorOffsets-0-a47aa5b3-d9d8-4aa6-ab30-7f79c971b6ee
 with group instance id null; client reason: need to re-join with the given 
member-id: 
connector-consumer-testGetSinkConnectorOffsets-0-a47aa5b3-d9d8-4aa6-ab30-7f79c971b6ee).
 (org.apache.kafka.coordinator.group.GroupMetadataManager:4673)
[2024-09-06 21:59:57,844] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Stabilized group connect-testGetSinkConnectorOffsets generation 1 
with 1 members. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:4383){code}
Even though 2 members tried to join , eventually the group never saw the stable 
group with 2 members. If we contrast this with the passing case: 
{code:java}
[2024-09-06 22:22:47,577] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Dynamic member with unknown member id joins group 
connect-testGetSinkConnectorOffsets in Empty state. Created a new member id 
connector-consumer-testGetSinkConnectorOffsets-1-a6cc10ec-9258-4293-8b9d-d240fe89e4fd
 and requesting the member to rejoin with this id. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:4111)
[2024-09-06 22:22:47,579] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Preparing to rebalance group connect-testGetSinkConnectorOffsets 
in state PreparingRebalance with old generation 0 (reason: Adding new member 
connector-consumer-testGetSinkConnectorOffsets-1-a6cc10ec-9258-4293-8b9d-d240fe89e4fd
 with group instance id null; client reason: need to re-join with the given 
member-id: 
connector-consumer-testGetSinkConnectorOffsets-1-a6cc10ec-9258-4293-8b9d-d240fe89e4fd).
 (org.apache.kafka.coordinator.group.GroupMetadataManager:4673)
[2024-09-06 22:22:47,580] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Stabilized group connect-testGetSinkConnectorOffsets generation 1 
with 1 members. (org.apache.kafka.coordinator.group.GroupMetadataManager:4383)
[2024-09-06 22:22:47,580] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Dynamic member with unknown member id joins group 
connect-testGetSinkConnectorOffsets in CompletingRebalance state. Created a new 
member id 
connector-consumer-testGetSinkConnectorOffsets-0-b5112649-3008-432e-a1eb-a10593d049b3
 and requesting the member to rejoin with this id. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:4111)
[2024-09-06 22:22:47,581] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Assignment received from leader 
connector-consumer-testGetSinkConnectorOffsets-1-a6cc10ec-9258-4293-8b9d-d240fe89e4fd
 for group connect-testGetSinkConnectorOffsets for generation 1. The group has 
1 members, 0 of which are static. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:5142)
[2024-09-06 22:22:47,582] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Preparing to rebalance group connect-testGetSinkConnectorOffsets 
in state PreparingRebalance with old generation 1 (reason: Adding new member 
connector-consumer-testGetSinkConnectorOffsets-0-b5112649-3008-432e-a1eb-a10593d049b3
 with group instance id null; client reason: need to re-join with the given 
member-id: 
connector-consumer-testGetSinkConnectorOffsets-0-b5112649-3008-432e-a1eb-a10593d049b3).
 (org.apache.kafka.coordinator.group.GroupMetadataManager:4673)
[2024-09-06 22:22:47,583] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Stabilized group connect-testGetSinkConnectorOffsets generation 2 
with 2 members. (org.apache.kafka.coordinator.group.GroupMetadataManager:4383)
[2024-09-06 22:22:47,584] INFO [GroupCoordinator id=0 topic=__consumer_offsets 
partition=45] Assignment received from leader 
connector-consumer-testGetSinkConnectorOffsets-1-a6cc10ec-9258-4293-8b9d-d240fe89e4fd
 for group connect-testGetSinkConnectorOffsets for generation 2. The group has 
2 members, 0 of which are static. 
(org.apache.kafka.coordinator.group.GroupMetadataManager:5142) {code}
So this seems in line with what Chris mentioned above. I am attaching the 
grepped Group Coordinator logs for reference. One difference between the 2 
cases[^flaky-tests-gc.txt] is that as I had mentioned in the above note as 
well, for the flaky test, we are reusing an existing connect/kafka cluster 
where we need to delete the existing topic etc while in the passing test, 
everything is afresh. 

 

 

> Sink connector-related OffsetsApiIntegrationTest suite test cases failing 
> more frequently with new consumer/group coordinator
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17493
>             Project: Kafka
>          Issue Type: Test
>          Components: connect, consumer, group-coordinator
>            Reporter: Chris Egerton
>            Priority: Major
>         Attachments: flaky-tests-gc.txt, passing-tests-gc.txt
>
>
> We recently updated trunk to use the new KIP-848 consumer/group coordinator 
> by default, which appears to have led to an uptick in flakiness for the 
> OffsetsApiIntegrationTest suite for Connect (specifically, the test cases 
> that use sink connectors, which makes sense since they're the type of 
> connector that uses a consumer group under the hood).
> Gradle Enterprise shows that in the week before that update was made, the 
> test suite had a flakiness rate of about 4% 
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1724558400000&search.startTimeMin=1723953600000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY),
>  and in the week and a half since, the flakiness rate has jumped to 17% 
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to