[ 
https://issues.apache.org/jira/browse/FLINK-22568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402295#comment-17402295
 ] 

Till Rohrmann commented on FLINK-22568:
---------------------------------------

I think the problem is our CI infrastructure because we have a gap of 14s 
between receiving the savepoint command and the actual triggering:

{code}
12:59:25,726 [flink-akka.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Triggering 
savepoint for job 2e88260d7fd42515ac6b8181788b2583.
12:59:27,579 [jobmanager-future-thread-6] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 1 for job 2e88260d7fd42515ac6b8181788b2583 (1362646 bytes, 
checkpointDuration=1623 ms, finalizationTime=291 ms).
12:59:39,015 [    Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 2 (type=SAVEPOINT) @ 1628773179014 for job 
2e88260d7fd42515ac6b8181788b2583.
12:59:39,019 [                main] ERROR 
org.apache.flink.test.checkpointing.RescalingITCase          [] - 
--------------------------------------------------------------------------------
Test testSavepointRescalingWithKeyedAndNonPartitionedState[backend = 
filesystem, buffersPerChannel = 
0](org.apache.flink.test.checkpointing.RescalingITCase) failed with:
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: 
Invocation of [LocalRpcInvocation(RestfulGateway.triggerSavepoint(JobID, 
String, boolean, Time))] at recipient [akka://flink/user/rpc/dispatcher_200] 
timed out.
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
        at 
org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingWithKeyedAndNonPartitionedState(RescalingITCase.java:425)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
        at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
        at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
        at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
        at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
        at 
org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
        at 
org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
        at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
        at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
        at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:150)
        at 
org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:120)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
        at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: java.util.concurrent.TimeoutException: Invocation of 
[LocalRpcInvocation(RestfulGateway.triggerSavepoint(JobID, String, boolean, 
Time))] at recipient [akka://flink/user/rpc/dispatcher_200] timed out.
        at com.sun.proxy.$Proxy369.triggerSavepoint(Unknown Source)
        at 
org.apache.flink.runtime.minicluster.MiniCluster.lambda$triggerSavepoint$9(MiniCluster.java:741)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
        at 
java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
        at 
org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:776)
        at 
org.apache.flink.runtime.minicluster.MiniCluster.triggerSavepoint(MiniCluster.java:739)
        at 
org.apache.flink.client.program.MiniClusterClient.triggerSavepoint(MiniClusterClient.java:101)
        at 
org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingWithKeyedAndNonPartitionedState(RescalingITCase.java:422)
        ... 60 more
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/rpc/dispatcher_200#1435682858]] after [10000 ms]. 
Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A 
typical reason for `AskTimeoutException` is that the recipient actor didn't 
send a reply.
{code}

I think this problem should be hopefully resolved/mitigated via FLINK-22932.

> RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList fails 
> with Timeout
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-22568
>                 URL: https://issues.apache.org/jira/browse/FLINK-22568
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.14.0
>            Reporter: Matthias
>            Priority: Major
>              Labels: test-stability
>             Fix For: 1.14.0
>
>
> [This 
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=409&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=634cd701-c189-5dff-24cb-606ed884db87]
>  failed (not exclusively) due to:
> * [testSavepointRescalingInPartitionedOperatorStateList[backend = 
> filesystem](org.apache.flink.test.checkpointing.RescalingITCase)|https://dev.azure.com/mapohl/flink/_build/results?buildId=409&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=634cd701-c189-5dff-24cb-606ed884db87&l=4193]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to