mdedetrich opened a new pull request, #12289: URL: https://github.com/apache/kafka/pull/12289
Previously the tests would fail even on retryable errors. This change adds a retryUntil similar to the current until method however it will retry the test if it throws an exception that passes a predicate. The exception predicate has also been updated to handle another case that can occur due to a stream not being initialized. In summary there are 2 reasons why this test is flaky. The first reason is that the test uses the `until` method which just fails whenever a retryable error occurs. Instead we should ideally retry the test only if the exception that is thrown matches a predicate defining a retryable error with this predicate being defined in the current `retrievableException`. The new `retryUntil` function respects the same deadlines/timeouts as the current `until` method. The second issue is that the list of current predicates, i.e. ```java containsString("Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"), containsString("The state store, source-table, may have migrated to another instance"), containsString("Cannot get state store source-table because the stream thread is STARTING, not RUNNING") ``` wasn't complete. Another one needed to be added, namely "The specified partition 1 for store source-table does not exist.". This is because its possible for ```java assertThat(getStore(kafkaStreams2, storeQueryParam2).get(key), is(nullValue())); ``` or ```java assertThat(getStore(kafkaStreams1, storeQueryParam2).get(key), is(nullValue())); ``` (depending on which branch) to be thrown, i.e. see ``` org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The specified partition 1 for store source-table does not exist. at org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63) at org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53) at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQuerySpecificActivePartitionStores$5(StoreQueryIntegrationTest.java:223) at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.retryUntil(StoreQueryIntegrationTest.java:579) at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:186) ``` This happens when the stream hasn't been initialized yet. I have run the test around 12k times using Intellij's JUnit testing framework without any exceptions It seems reasonable that the other tests in `StoreQueryIntegrationTest.java` should also be using the new `retryUntil` rather than `until` if they also experience the same flaky tests however if this is the case then it should be done in future individual PR's. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org