[ 
https://issues.apache.org/jira/browse/KAFKA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385841#comment-17385841
 ] 

A. Sophie Blee-Goldman commented on KAFKA-13128:
------------------------------------------------

The failure is from the second line in this series of assertions 
{code:java}
assertThat(store1.get(key), is(notNullValue()));
assertThat(store1.get(key2), is(notNullValue()));
assertThat(store1.get(key3), is(notNullValue()));
{code}
which is basically the first time we attempt IQ after starting up. The test 
setup includes starting Streams and waiting for it to reach RUNNING, then 
adding a new thread, and finally producing a set of 100 records for each of the 
three keys. After that it waits for all records to be processed *for _key3_* 
and then proceeds to the above assertions.

I suspect the problem is that we only wait for all data to be processed for 
_key3_, but not the other two keys. In theory this should work, since the data 
for _key3_ is produced last and would have the largest timestamps meaning the 
keys should be processed more or less in order. However the input topic 
actually has two partitions, so it could be that _key1_ and _key3_ correspond 
to task 1 while _key2_ corresponds to task 2. Again, that shouldn't affect the 
order in which records are processed – as long as the tasks are on the same 
thread.

But we started up a new thread in between waiting for Streams to reach RUNNING 
and producing data to the input topics. This new thread has to be assigned one 
of the tasks, but due to cooperative rebalancing it will take two full (though 
short) rebalances before the new thread can actually start processing any 
tasks. Therefore as long as the original thread continues to own the task 
corresponding to _key3_ after the new thread is added, it can easily get 
through all records for _key3_. Which would mean the test can proceed to the 
above assertions while the new thread is still waiting to start processing any 
data for _key2_ at all.

There are a few ways we can address this given how many things had to happen 
exactly right in order to see this failure, but the simplest fix is to just 
wait on all three keys to be fully processed rather than just the one. This 
seems to align with the original intention of the test best as well

> Flaky Test 
> StoreQueryIntegrationTest.shouldQueryStoresAfterAddingAndRemovingStreamThread
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13128
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13128
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.1.0
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>              Labels: flaky-test
>
> h3. Stacktrace
> java.lang.AssertionError: Expected: is not null but: was null 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) 
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQueryStoresAfterAddingAndRemovingStreamThread$19(StoreQueryIntegrationTest.java:461)
>   at 
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.until(StoreQueryIntegrationTest.java:506)
>   at 
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQueryStoresAfterAddingAndRemovingStreamThread(StoreQueryIntegrationTest.java:455)
>  
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11085/5/testReport/org.apache.kafka.streams.integration/StoreQueryIntegrationTest/Build___JDK_16_and_Scala_2_13___shouldQueryStoresAfterAddingAndRemovingStreamThread_2/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to