[ 
https://issues.apache.org/jira/browse/KAFKA-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881177#comment-17881177
 ] 

Yu-Lin Chen commented on KAFKA-17515:
-------------------------------------

Another trivial finding regarding the flaky tests, both tests are timeout, but 
the CI shows different failure reasons. This is because:
 * If local state dir was purged before RocksDBStore.flush(), the test will 
failed directly.  ([The first flaky 
link|https://ge.apache.org/s/havqcr7zu2tbk/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?expanded-stacktrace=WyIwIl0&focused-execution=1&page=eyJvdXRwdXQiOnsiMCI6MSwiMSI6Mn19&top-execution=2#L177])
 * If local state dir was purged before writing OffsetCheckpoint, the test only 
throw warning in logs, the CI will keep running. ([The second flaky 
link|https://ge.apache.org/s/hdpapdbvngcts/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?focused-execution=1&top-execution=2#L196])

I'm not sure whether the timeout indirectly caused the slowly start of ks-1 
tasks in flaky test #2.  But we can fix the known issue first.

> Fix flaky 
> RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-17515
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17515
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams, unit tests
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>
> {code:java}
> Stacktrace
> java.nio.file.DirectoryNotEmptyException: 
> /tmp/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ10111145955704739924-ks1/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ/0_0
>       at 
> java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:289)
>       at 
> java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:109)
>       at java.base/java.nio.file.Files.deleteIfExists(Files.java:1191)
>       at 
> org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:898)
>       at 
> org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:870)
>       at java.base/java.nio.file.Files.walkFileTree(Files.java:2803)
>       at java.base/java.nio.file.Files.walkFileTree(Files.java:2857)
>       at org.apache.kafka.common.utils.Utils.delete(Utils.java:870)
>       at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:266)
>       at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:278)
>       at 
> org.apache.kafka.streams.integration.RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener(RestoreIntegrationTest.java:583)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to