`flink-2`Hi Harshith, Could you share your full log files from the job master? As I understand, this stack trace already belongs to a failover attempt, what was the original cause of failover? Do you still have any other job state in S3 for this cluster id `flink-2`? Have you tried the latest version of Flink 1.9?
Best, Andrey On Mon, Dec 9, 2019 at 12:37 PM Kumar Bolar, Harshith <hk...@arity.com> wrote: > Hi all, > > > > I'm running a standalone Flink cluster with Zookeeper and S3 for high > availability storage. All of a sudden, the job managers started failing > with an S3 `UnrecoverableS3OperationException` error. Here is the full > error trace - > > > > ``` > > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > > at > akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > Caused by: org.apache.flink.runtime.client.JobExecutionException: Could > not set up JobManager > > at > org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176) > > at > org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058) > > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308) > > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) > > ... 7 more > > Caused by: > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$UnrecoverableS3OperationException: > org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: > The specified key does not exist. (Service: Amazon S3; Status Code: 404; > Error Code: NoSuchKey; Request ID: 1769066EBD605AB5; S3 Extended Request > ID: > K8jjbsE4DPAsZJDVJKBq3Nh0E0o+feafefavbvbaae+nbUTphHHw73/eafafefa+dsVMR0=), > S3 Extended Request ID: > lklalkioe+eae2234+nbUTphHHw73/gVSclc1o1YH7M0MeNjmXl+dsVMR0= (Path: > s3://abc-staging/flink/jobmanagerha/flink-2/blob/job_3e16166a1122885eb6e9b2437929b266/blob_p-3b687174148e9e1dd951f2a9fbec83f4fcd5281e-b85417f69b354c83b270bf01dcf389e0) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:908) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:893) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:878) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:871) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$0(PrestoS3FileSystem.java:810) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) > > at > org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:809) > > ... 10 more > > Caused by: > org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: > The specified key does not exist. (Service: Amazon S3; Status Code: 404; > Error Code: NoSuchKey; Request ID: 1769066EBaD6aefB5; S3 Extended Request > ID: fealloga+4rVwsF+nbUTphHHw73/gVSclc1o1YH7M0MeNjmXl+dsVMR0=), S3 Extended > Request ID: > K8jjbsE4DPAsZJDVJKBq3Nh0E0o+4rVwsF+nbUTphHHweafga/lc1o1YH7M0MeNjmXl+dsVMR0= > > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) > > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) > > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) > > ... 30 more > > ``` > > > > I could fix this by changing the `high-availability.cluster-id` property > (which is currently set to `flink-2`) but with that I would lose all the > existing jobs and state. Is there any way I can tell Flink to ignore this > particular key in S3 and start the job managers? > > > > Thanks, > > Harshith >