[ https://issues.apache.org/jira/browse/FLINK-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076395#comment-17076395 ]
Jark Wu commented on FLINK-17006: --------------------------------- Hi [~yunta], can you find any clues from your side? This IT case uses Rocksdb state backend, and the source will fail once process half data to make the job failover and restore the state. It seems that the restoring is failed because of the missing file. > AggregateITCase.testDistinctGroupBy fails with FileNotFoundException (in > Rocksdb) > --------------------------------------------------------------------------------- > > Key: FLINK-17006 > URL: https://issues.apache.org/jira/browse/FLINK-17006 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends, Table SQL / Runtime, Tests > Affects Versions: 1.11.0 > Reporter: Robert Metzger > Priority: Major > Labels: test-stability > > CI run: > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7045&view=logs&j=e25d5e7e-2a9c-5589-4940-0b638d75a414&t=294c2388-20e6-57a2-5721-91db544b1e69 > Log output: > {code} > 2020-04-03T17:17:44.4036304Z [ERROR] Tests run: 234, Failures: 0, Errors: 1, > Skipped: 6, Time elapsed: 155.577 s <<< FAILURE! - in > org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase > 2020-04-03T17:17:44.4038781Z [ERROR] testDistinctGroupBy[LocalGlobal=OFF, > MiniBatch=ON, > StateBackend=ROCKSDB](org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase) > Time elapsed: 0.456 s <<< ERROR! > 2020-04-03T17:17:44.4040384Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2020-04-03T17:17:44.4041520Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147) > 2020-04-03T17:17:44.4042712Z at > org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:659) > 2020-04-03T17:17:44.4043972Z at > org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:77) > 2020-04-03T17:17:44.4045540Z at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1644) > 2020-04-03T17:17:44.4047015Z at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1626) > 2020-04-03T17:17:44.4048576Z at > org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:673) > 2020-04-03T17:17:44.4050073Z at > org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testDistinctGroupBy(AggregateITCase.scala:172) > 2020-04-03T17:17:44.4051200Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-04-03T17:17:44.4052171Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-04-03T17:17:44.4053308Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-04-03T17:17:44.4054322Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-04-03T17:17:44.4055410Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > 2020-04-03T17:17:44.4056570Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-04-03T17:17:44.4057800Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > 2020-04-03T17:17:44.4059019Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-04-03T17:17:44.4060178Z at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 2020-04-03T17:17:44.4061261Z at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 2020-04-03T17:17:44.4062617Z at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > 2020-04-03T17:17:44.4063782Z at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > 2020-04-03T17:17:44.4064838Z at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > 2020-04-03T17:17:44.4065742Z at > org.junit.rules.RunRules.evaluate(RunRules.java:20) > 2020-04-03T17:17:44.4066636Z at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > 2020-04-03T17:17:44.4067762Z at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > 2020-04-03T17:17:44.4068895Z at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > 2020-04-03T17:17:44.4069978Z at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > 2020-04-03T17:17:44.4070920Z at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > 2020-04-03T17:17:44.4071901Z at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > 2020-04-03T17:17:44.4072875Z at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > 2020-04-03T17:17:44.4073850Z at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > 2020-04-03T17:17:44.4074854Z at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > 2020-04-03T17:17:44.4075729Z at > org.junit.runners.Suite.runChild(Suite.java:128) > 2020-04-03T17:17:44.4076541Z at > org.junit.runners.Suite.runChild(Suite.java:27) > 2020-04-03T17:17:44.4077479Z at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > 2020-04-03T17:17:44.4078422Z at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > 2020-04-03T17:17:44.4079501Z at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > 2020-04-03T17:17:44.4080503Z at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > 2020-04-03T17:17:44.4081483Z at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > 2020-04-03T17:17:44.4082477Z at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > 2020-04-03T17:17:44.4083522Z at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > 2020-04-03T17:17:44.4084529Z at > org.junit.rules.RunRules.evaluate(RunRules.java:20) > 2020-04-03T17:17:44.4085420Z at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > 2020-04-03T17:17:44.4086433Z at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > 2020-04-03T17:17:44.4087696Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > 2020-04-03T17:17:44.4088900Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > 2020-04-03T17:17:44.4090109Z at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > 2020-04-03T17:17:44.4091331Z at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > 2020-04-03T17:17:44.4092600Z at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > 2020-04-03T17:17:44.4093737Z at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > 2020-04-03T17:17:44.4094894Z at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > 2020-04-03T17:17:44.4096257Z Caused by: > org.apache.flink.runtime.JobException: Recovery is suppressed by > FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, > backoffTimeMS=0) > 2020-04-03T17:17:44.4097915Z at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110) > 2020-04-03T17:17:44.4099539Z at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76) > 2020-04-03T17:17:44.4101039Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:190) > 2020-04-03T17:17:44.4102353Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:184) > 2020-04-03T17:17:44.4103808Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:178) > 2020-04-03T17:17:44.4105242Z at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:505) > 2020-04-03T17:17:44.4106478Z at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:384) > 2020-04-03T17:17:44.4107561Z at > sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > 2020-04-03T17:17:44.4108552Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-04-03T17:17:44.4109560Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-04-03T17:17:44.4110604Z at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284) > 2020-04-03T17:17:44.4111812Z at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199) > 2020-04-03T17:17:44.4113054Z at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > 2020-04-03T17:17:44.4114282Z at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > 2020-04-03T17:17:44.4115417Z at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > 2020-04-03T17:17:44.4116357Z at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > 2020-04-03T17:17:44.4117338Z at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > 2020-04-03T17:17:44.4118401Z at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > 2020-04-03T17:17:44.4119414Z at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > 2020-04-03T17:17:44.4120443Z at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > 2020-04-03T17:17:44.4121538Z at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > 2020-04-03T17:17:44.4122461Z at > akka.actor.Actor$class.aroundReceive(Actor.scala:517) > 2020-04-03T17:17:44.4123383Z at > akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > 2020-04-03T17:17:44.4124315Z at > akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > 2020-04-03T17:17:44.4125257Z at > akka.actor.ActorCell.invoke(ActorCell.scala:561) > 2020-04-03T17:17:44.4126107Z at > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > 2020-04-03T17:17:44.4126953Z at akka.dispatch.Mailbox.run(Mailbox.scala:225) > 2020-04-03T17:17:44.4127798Z at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > 2020-04-03T17:17:44.4128829Z at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > 2020-04-03T17:17:44.4129875Z at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > 2020-04-03T17:17:44.4130957Z at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > 2020-04-03T17:17:44.4132016Z at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2020-04-03T17:17:44.4133098Z Caused by: java.lang.Exception: Exception while > creating StreamOperatorStateContext. > 2020-04-03T17:17:44.4134528Z at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191) > 2020-04-03T17:17:44.4136100Z at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:246) > 2020-04-03T17:17:44.4137580Z at > org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:293) > 2020-04-03T17:17:44.4138925Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:436) > 2020-04-03T17:17:44.4140304Z at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) > 2020-04-03T17:17:44.4141631Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:432) > 2020-04-03T17:17:44.4142783Z at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:445) > 2020-04-03T17:17:44.4143814Z at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:718) > 2020-04-03T17:17:44.4144935Z at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:542) > 2020-04-03T17:17:44.4145756Z at java.lang.Thread.run(Thread.java:748) > 2020-04-03T17:17:44.4147095Z Caused by: org.apache.flink.util.FlinkException: > Could not restore keyed state backend for > KeyedMapBundleOperator_f6dc7f4d2283f4605b127b9364e21148_(2/4) from any of the > 1 provided restore options. > 2020-04-03T17:17:44.4148863Z at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > 2020-04-03T17:17:44.4150449Z at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:304) > 2020-04-03T17:17:44.4152096Z at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:131) > 2020-04-03T17:17:44.4153151Z ... 9 more > 2020-04-03T17:17:44.4153946Z Caused by: > org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected > exception. > 2020-04-03T17:17:44.4155379Z at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:336) > 2020-04-03T17:17:44.4156850Z at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:548) > 2020-04-03T17:17:44.4158503Z at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:288) > 2020-04-03T17:17:44.4160158Z at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > 2020-04-03T17:17:44.4161662Z at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > 2020-04-03T17:17:44.4162706Z ... 11 more > 2020-04-03T17:17:44.4164646Z Caused by: java.io.FileNotFoundException: > /tmp/junit1553841028375950249/junit3479071836389613442/babdd750dc1c5b3874a0dd55d14a84f6/shared/2aa67d1b-8841-4755-84c4-b891fc8c3352 > (No such file or directory) > 2020-04-03T17:17:44.4166003Z at java.io.FileInputStream.open0(Native Method) > 2020-04-03T17:17:44.4166782Z at > java.io.FileInputStream.open(FileInputStream.java:195) > 2020-04-03T17:17:44.4167752Z at > java.io.FileInputStream.<init>(FileInputStream.java:138) > 2020-04-03T17:17:44.4168802Z at > org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50) > 2020-04-03T17:17:44.4170003Z at > org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142) > 2020-04-03T17:17:44.4171168Z at > org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85) > 2020-04-03T17:17:44.4172447Z at > org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68) > 2020-04-03T17:17:44.4173862Z at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126) > 2020-04-03T17:17:44.4175519Z at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109) > 2020-04-03T17:17:44.4176945Z at > org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50) > 2020-04-03T17:17:44.4178185Z at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > 2020-04-03T17:17:44.4179404Z at > org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211) > 2020-04-03T17:17:44.4180647Z at > java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654) > 2020-04-03T17:17:44.4181769Z at > java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871) > 2020-04-03T17:17:44.4183098Z at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83) > 2020-04-03T17:17:44.4184752Z at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:67) > 2020-04-03T17:17:44.4186598Z at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:229) > 2020-04-03T17:17:44.4188548Z at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:194) > 2020-04-03T17:17:44.4190380Z at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:168) > 2020-04-03T17:17:44.4192129Z at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:154) > 2020-04-03T17:17:44.4193725Z at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:279) > 2020-04-03T17:17:44.4194758Z ... 15 more > 2020-04-03T17:17:44.4195053Z > {code} > I'm uncertain about the component assignment of this ticket. This error can > probably have many causes? -- This message was sent by Atlassian Jira (v8.3.4#803005)