mdedetrich opened a new pull request, #12289:
URL: https://github.com/apache/kafka/pull/12289

   Previously the tests would fail even on retryable errors. This change adds a 
retryUntil similar to the current until method however it will retry the test 
if it throws an exception that passes a predicate. The exception predicate has 
also been updated to handle another case that can occur due to a stream not 
being initialized.
   
   In summary there are 2 reasons why this test is flaky. The first reason is 
that the test uses the `until` method which just fails whenever a retryable 
error occurs. Instead we should ideally retry the test only if the exception 
that is thrown matches a predicate defining a retryable error with this 
predicate being defined in the current `retrievableException`. The new 
`retryUntil` function respects the same deadlines/timeouts as the current 
`until` method.
   
   The second issue is that the list of current predicates, i.e.
   
   ```java
   containsString("Cannot get state store source-table because the stream 
thread is PARTITIONS_ASSIGNED, not RUNNING"),
   containsString("The state store, source-table, may have migrated to another 
instance"),
   containsString("Cannot get state store source-table because the stream 
thread is STARTING, not RUNNING")
   ```
   wasn't complete. Another one needed to be added, namely "The specified 
partition 1 for store source-table does not exist.". This is because its 
possible for 
   
   ```java
   assertThat(getStore(kafkaStreams2, storeQueryParam2).get(key), 
is(nullValue()));
   ```         
   or
   
   ```java
   assertThat(getStore(kafkaStreams1, storeQueryParam2).get(key), 
is(nullValue()));
   ```
   
   (depending on which branch) to be thrown, i.e. see
   ```
   org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The 
specified partition 1 for store source-table does not exist.
   
        at 
org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63)
        at 
org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53)
        at 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQuerySpecificActivePartitionStores$5(StoreQueryIntegrationTest.java:223)
        at 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.retryUntil(StoreQueryIntegrationTest.java:579)
        at 
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:186)
   ```
   
   This happens when the stream hasn't been initialized yet. I have run the 
test around 12k times using Intellij's JUnit testing framework without any 
exceptions
   
   It seems reasonable that the other tests in `StoreQueryIntegrationTest.java` 
should also be using the new `retryUntil` rather than `until` if they also 
experience the same flaky tests however if this is the case then it should be 
done in future individual PR's.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to