[ 
https://issues.apache.org/jira/browse/FLINK-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076395#comment-17076395
 ] 

Jark Wu commented on FLINK-17006:
---------------------------------

Hi [~yunta], can you find any clues from your side? This IT case uses Rocksdb 
state backend, and the source will fail once process half data to make the job 
failover and restore the state. It seems that the restoring is failed because 
of the missing file.

> AggregateITCase.testDistinctGroupBy fails with FileNotFoundException (in 
> Rocksdb)
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-17006
>                 URL: https://issues.apache.org/jira/browse/FLINK-17006
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends, Table SQL / Runtime, Tests
>    Affects Versions: 1.11.0
>            Reporter: Robert Metzger
>            Priority: Major
>              Labels: test-stability
>
> CI run: 
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7045&view=logs&j=e25d5e7e-2a9c-5589-4940-0b638d75a414&t=294c2388-20e6-57a2-5721-91db544b1e69
> Log output:
> {code}
> 2020-04-03T17:17:44.4036304Z [ERROR] Tests run: 234, Failures: 0, Errors: 1, 
> Skipped: 6, Time elapsed: 155.577 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase
> 2020-04-03T17:17:44.4038781Z [ERROR] testDistinctGroupBy[LocalGlobal=OFF, 
> MiniBatch=ON, 
> StateBackend=ROCKSDB](org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase)
>   Time elapsed: 0.456 s  <<< ERROR!
> 2020-04-03T17:17:44.4040384Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-04-03T17:17:44.4041520Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-04-03T17:17:44.4042712Z  at 
> org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:659)
> 2020-04-03T17:17:44.4043972Z  at 
> org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:77)
> 2020-04-03T17:17:44.4045540Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1644)
> 2020-04-03T17:17:44.4047015Z  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1626)
> 2020-04-03T17:17:44.4048576Z  at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:673)
> 2020-04-03T17:17:44.4050073Z  at 
> org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testDistinctGroupBy(AggregateITCase.scala:172)
> 2020-04-03T17:17:44.4051200Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-04-03T17:17:44.4052171Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-04-03T17:17:44.4053308Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-04-03T17:17:44.4054322Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-04-03T17:17:44.4055410Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-04-03T17:17:44.4056570Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-04-03T17:17:44.4057800Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-04-03T17:17:44.4059019Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-04-03T17:17:44.4060178Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-04-03T17:17:44.4061261Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-04-03T17:17:44.4062617Z  at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
> 2020-04-03T17:17:44.4063782Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-04-03T17:17:44.4064838Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2020-04-03T17:17:44.4065742Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-04-03T17:17:44.4066636Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-04-03T17:17:44.4067762Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-04-03T17:17:44.4068895Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-04-03T17:17:44.4069978Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-04-03T17:17:44.4070920Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-04-03T17:17:44.4071901Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-04-03T17:17:44.4072875Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-04-03T17:17:44.4073850Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-04-03T17:17:44.4074854Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-04-03T17:17:44.4075729Z  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2020-04-03T17:17:44.4076541Z  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2020-04-03T17:17:44.4077479Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-04-03T17:17:44.4078422Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-04-03T17:17:44.4079501Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-04-03T17:17:44.4080503Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-04-03T17:17:44.4081483Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-04-03T17:17:44.4082477Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-04-03T17:17:44.4083522Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-04-03T17:17:44.4084529Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-04-03T17:17:44.4085420Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-04-03T17:17:44.4086433Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-04-03T17:17:44.4087696Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-04-03T17:17:44.4088900Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-04-03T17:17:44.4090109Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-04-03T17:17:44.4091331Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-04-03T17:17:44.4092600Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-04-03T17:17:44.4093737Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-04-03T17:17:44.4094894Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2020-04-03T17:17:44.4096257Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
> backoffTimeMS=0)
> 2020-04-03T17:17:44.4097915Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
> 2020-04-03T17:17:44.4099539Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
> 2020-04-03T17:17:44.4101039Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:190)
> 2020-04-03T17:17:44.4102353Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:184)
> 2020-04-03T17:17:44.4103808Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:178)
> 2020-04-03T17:17:44.4105242Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:505)
> 2020-04-03T17:17:44.4106478Z  at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:384)
> 2020-04-03T17:17:44.4107561Z  at 
> sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
> 2020-04-03T17:17:44.4108552Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-04-03T17:17:44.4109560Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-04-03T17:17:44.4110604Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> 2020-04-03T17:17:44.4111812Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> 2020-04-03T17:17:44.4113054Z  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> 2020-04-03T17:17:44.4114282Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> 2020-04-03T17:17:44.4115417Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2020-04-03T17:17:44.4116357Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2020-04-03T17:17:44.4117338Z  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2020-04-03T17:17:44.4118401Z  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2020-04-03T17:17:44.4119414Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2020-04-03T17:17:44.4120443Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-04-03T17:17:44.4121538Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-04-03T17:17:44.4122461Z  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2020-04-03T17:17:44.4123383Z  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2020-04-03T17:17:44.4124315Z  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2020-04-03T17:17:44.4125257Z  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2020-04-03T17:17:44.4126107Z  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2020-04-03T17:17:44.4126953Z  at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2020-04-03T17:17:44.4127798Z  at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2020-04-03T17:17:44.4128829Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-04-03T17:17:44.4129875Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-04-03T17:17:44.4130957Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-04-03T17:17:44.4132016Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-04-03T17:17:44.4133098Z Caused by: java.lang.Exception: Exception while 
> creating StreamOperatorStateContext.
> 2020-04-03T17:17:44.4134528Z  at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)
> 2020-04-03T17:17:44.4136100Z  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:246)
> 2020-04-03T17:17:44.4137580Z  at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:293)
> 2020-04-03T17:17:44.4138925Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:436)
> 2020-04-03T17:17:44.4140304Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> 2020-04-03T17:17:44.4141631Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:432)
> 2020-04-03T17:17:44.4142783Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:445)
> 2020-04-03T17:17:44.4143814Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:718)
> 2020-04-03T17:17:44.4144935Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:542)
> 2020-04-03T17:17:44.4145756Z  at java.lang.Thread.run(Thread.java:748)
> 2020-04-03T17:17:44.4147095Z Caused by: org.apache.flink.util.FlinkException: 
> Could not restore keyed state backend for 
> KeyedMapBundleOperator_f6dc7f4d2283f4605b127b9364e21148_(2/4) from any of the 
> 1 provided restore options.
> 2020-04-03T17:17:44.4148863Z  at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> 2020-04-03T17:17:44.4150449Z  at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:304)
> 2020-04-03T17:17:44.4152096Z  at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:131)
> 2020-04-03T17:17:44.4153151Z  ... 9 more
> 2020-04-03T17:17:44.4153946Z Caused by: 
> org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected 
> exception.
> 2020-04-03T17:17:44.4155379Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:336)
> 2020-04-03T17:17:44.4156850Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:548)
> 2020-04-03T17:17:44.4158503Z  at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:288)
> 2020-04-03T17:17:44.4160158Z  at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> 2020-04-03T17:17:44.4161662Z  at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> 2020-04-03T17:17:44.4162706Z  ... 11 more
> 2020-04-03T17:17:44.4164646Z Caused by: java.io.FileNotFoundException: 
> /tmp/junit1553841028375950249/junit3479071836389613442/babdd750dc1c5b3874a0dd55d14a84f6/shared/2aa67d1b-8841-4755-84c4-b891fc8c3352
>  (No such file or directory)
> 2020-04-03T17:17:44.4166003Z  at java.io.FileInputStream.open0(Native Method)
> 2020-04-03T17:17:44.4166782Z  at 
> java.io.FileInputStream.open(FileInputStream.java:195)
> 2020-04-03T17:17:44.4167752Z  at 
> java.io.FileInputStream.<init>(FileInputStream.java:138)
> 2020-04-03T17:17:44.4168802Z  at 
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
> 2020-04-03T17:17:44.4170003Z  at 
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
> 2020-04-03T17:17:44.4171168Z  at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
> 2020-04-03T17:17:44.4172447Z  at 
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
> 2020-04-03T17:17:44.4173862Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)
> 2020-04-03T17:17:44.4175519Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)
> 2020-04-03T17:17:44.4176945Z  at 
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
> 2020-04-03T17:17:44.4178185Z  at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
> 2020-04-03T17:17:44.4179404Z  at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
> 2020-04-03T17:17:44.4180647Z  at 
> java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)
> 2020-04-03T17:17:44.4181769Z  at 
> java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)
> 2020-04-03T17:17:44.4183098Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)
> 2020-04-03T17:17:44.4184752Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:67)
> 2020-04-03T17:17:44.4186598Z  at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:229)
> 2020-04-03T17:17:44.4188548Z  at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:194)
> 2020-04-03T17:17:44.4190380Z  at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:168)
> 2020-04-03T17:17:44.4192129Z  at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:154)
> 2020-04-03T17:17:44.4193725Z  at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:279)
> 2020-04-03T17:17:44.4194758Z  ... 15 more
> 2020-04-03T17:17:44.4195053Z 
> {code}
> I'm uncertain about the component assignment of this ticket. This error can 
> probably have many causes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to