[jira] [Comment Edited] (KAFKA-17493) Sink connector-related OffsetsApiIntegrationTest suite test cases failing more frequently with new consumer/group coordinator

Sagar Rao (Jira) Wed, 11 Sep 2024 08:48:06 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881001#comment-17881001
 ]


Sagar Rao edited comment on KAFKA-17493 at 9/11/24 3:47 PM:
------------------------------------------------------------

[~dajac] , [~ChrisEgerton] I took a look at the logs for 
[testGetSinkConnectorOffsets|https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.OffsetsApiIntegrationTest&tests.sortField=FLAKY&tests.test=testGetSinkConnectorOffsets()].
 I noticed a couple of differences which which may contribute to the flakiness 
(not totally sure at this point):

1) For the passed test case, I see that when the test passes, at that point we 
are spinning up a new connect cluster. When that happens, I see 
[verifyClusterReadiness|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java#L181]
  getting triggered which checks whether the kafka cluster is ready or not and 
also an Admin client is able to do admin stuff. In the failing case, I see we 
don't have that and instead we reuse an existing connect cluster as per 
[this|#L129].]

2) In the failed test, the connector comes up properly till this point, but it 
appears to me that it gets stuck when trying to read the offsets using the 
Admin client 
[here|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1234-L1252]
 I see the same line in the stacktrace as well

```

at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)    
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)    
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)    
at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:397)   
 
at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:445) 
   
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:394)    
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:378)    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:999)
    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.getAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:226)
    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testGetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:173)
    
at java.lang.reflect.Method.invoke(Method.java:569)    
at java.util.ArrayList.forEach(ArrayList.java:1511)    
at java.util.ArrayList.forEach(ArrayList.java:1511)    

```

We are trying to use the AdminClient to read the sink connector offsets 
[here|#L1234-L1252].]  There's not much indication in the logs as to why this 
is happening. 


was (Author: sagarrao):
[~dajac] , [~ChrisEgerton] I took a look at the logs for 
[testGetSinkConnectorOffsets|https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.OffsetsApiIntegrationTest&tests.sortField=FLAKY&tests.test=testGetSinkConnectorOffsets()].
 I noticed a couple of differences which which may contribute to the flakiness 
(not totally sure at this point):

1) For the passed test case, I see that when the test passes, at that point we 
are spinning up a new connect cluster. When that happens, I see 
[verifyClusterReadiness|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java#L181]
  getting triggered which checks whether the kafka cluster is ready or not and 
also an Admin client is able to do admin stuff. In the failing case, I see we 
don't have that and instead we reuse an existing connect cluster as per 
[this|[https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/integration/OffsetsApiIntegrationTest.java#L129].]

2) In the failed test, the connector comes up properly till this point, but it 
appears to me that it gets stuck when trying to read the offsets using the 
Admin client 
[here|[https://github.com/apache/kafka/blob/trunk/connect/runtime/src/test/java/org/apache/kafka/connect/integration/OffsetsApiIntegrationTest.java#L226].]
 I see the same line in the stacktrace as well

```

at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)    
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)    
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)    
at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:397)   
 
at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:445) 
   
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:394)    
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:378)    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:999)
    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.getAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:226)
    
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testGetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:173)
    
at java.lang.reflect.Method.invoke(Method.java:569)    
at java.util.ArrayList.forEach(ArrayList.java:1511)    
at java.util.ArrayList.forEach(ArrayList.java:1511)    

```

We are trying to use the AdminClient to read the sink connector offsets 
[here|[https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1234-L1252].]
  There's not much indication in the logs as to why this is happening. 

> Sink connector-related OffsetsApiIntegrationTest suite test cases failing 
> more frequently with new consumer/group coordinator
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17493
>             Project: Kafka
>          Issue Type: Test
>          Components: connect, consumer, group-coordinator
>            Reporter: Chris Egerton
>            Priority: Major
>
> We recently updated trunk to use the new KIP-848 consumer/group coordinator 
> by default, which appears to have led to an uptick in flakiness for the 
> OffsetsApiIntegrationTest suite for Connect (specifically, the test cases 
> that use sink connectors, which makes sense since they're the type of 
> connector that uses a consumer group under the hood).
> Gradle Enterprise shows that in the week before that update was made, the 
> test suite had a flakiness rate of about 4% 
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1724558400000&search.startTimeMin=1723953600000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY),
>  and in the week and a half since, the flakiness rate has jumped to 17% 
> (https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1725681599999&search.startTimeMin=1724731200000&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-17493) Sink connector-related OffsetsApiIntegrationTest suite test cases failing more frequently with new consumer/group coordinator

Reply via email to