I noticed a test instability that sounds quite similar to what you're experiencing. I created FLINK-31168 [1] to follow-up on this one.
[1] https://issues.apache.org/jira/browse/FLINK-31168 On Mon, Feb 20, 2023 at 4:50 PM Matthias Pohl <matthias.p...@aiven.io> wrote: > What do you mean by "earlier it used to fail due to ExecutionGraphStore > not existing in /tmp" folder? Did you get the error message "Could not > create executionGraphStorage directory in /tmp." and creating this folder > fixed the issue? > > It also looks like the stacktrace doesn't match any of the 1.15 versions > in terms of line numbers. Or I might miss something here. Could you provide > the exact Flink version you're using? > > I might also help to share the JobManager logs to understand the context > in which the cancel operation was triggered. > > Matthias > > On Mon, Feb 20, 2023 at 1:53 AM Puneet Duggal <puneetduggal1...@gmail.com> > wrote: > >> Flink Cluster Context: >> >> >> - Flink Version - 1.15 >> - Deployment Mode - Session >> - Number of Job Managers - 3 (HA) >> - Number of Task Managers - 1 >> >> >> Cancellation of Job fails due to following >> >> org.apache.flink.runtime.rest.NotFoundException: Job >> 1cb2185d4d72c8c6f0a3a549d7de4ef0 not found >> at >> org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) >> at >> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) >> at >> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >> at >> org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) >> at >> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >> at >> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >> at >> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:252) >> at >> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >> at >> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >> at >> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387) >> at >> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) >> at >> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) >> at >> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) >> at >> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >> at >> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) >> at >> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:45) >> at akka.dispatch.OnComplete.internal(Future.scala:299) >> at akka.dispatch.OnComplete.internal(Future.scala:297) >> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) >> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221) >> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) >> at >> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65) >> at >> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) >> at >> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) >> at >> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) >> at >> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) >> at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621) >> at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:118) >> at >> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:1144) >> at akka.actor.Actor.aroundReceive(Actor.scala:537) >> at akka.actor.Actor.aroundReceive$(Actor.scala:535) >> at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:540) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) >> at akka.actor.ActorCell.invoke(ActorCell.scala:548) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) >> at akka.dispatch.Mailbox.run(Mailbox.scala:231) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:243) >> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) >> at >> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) >> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) >> at >> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) >> Caused by: org.apache.flink.runtime.messages.FlinkJobNotFoundException: >> Could not find Flink job (1cb2185d4d72c8c6f0a3a549d7de4ef0) >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.requestExecutionGraphInfo(Dispatcher.java:812) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:304) >> at >> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) >> at >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:302) >> at >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) >> at >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) >> at >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) >> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) >> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) >> at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) >> at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) >> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) >> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) >> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) >> at akka.actor.Actor.aroundReceive(Actor.scala:537) >> at akka.actor.Actor.aroundReceive$(Actor.scala:535) >> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) ... 9 >> more >> >> >> Earlier it used to fail due to Execution Graph Store not existing in /tmp >> Folder but this is no longer the issue. Now it is failing due to above >> provided issue. >> >> Thanks, >> Puneet >> >