Hi,
I've tried to upgrade a Beam job to 2.3.0 and deploy on Dataflow and getting
the following error:
2018-03-01 10:52:35 INFO PackageUtil:316 - Uploading 169 files from
PipelineOptions.filesToStage to staging location to prepare for execution.
Exception in thread "main" java.lang.RuntimeException: Error while staging
packages
at
org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:396)
at
org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:272)
at
org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:76)
at
org.apache.beam.runners.dataflow.util.GcsStager.stageDefaultFiles(GcsStager.java:64)
at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:661)
at
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at com.gocardless.data.beam.GCSToBigQuery.main(GCSToBigQuery.java:47)
Caused by: java.io.IOException: Error executing batch GCS request
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:610)
at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:341)
at
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:216)
at
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:85)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:123)
at
org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:188)
at
org.apache.beam.runners.dataflow.util.PackageUtil.alreadyStaged(PackageUtil.java:159)
at
org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:183)
at
org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stagePackage$1(PackageUtil.java:173)
at
org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at
org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at
org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException:
com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
at
org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at
org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:479)
at
org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:602)
... 14 more
Looks like it's when staging files, but I haven't changed the staging location
(or anything else) - just the Beam version.
Have tried a couple of things I can think of, like adding a slash to the end of
the staging path, and deleting the directory to see if it gets recreated (it
didn't), but no luck.
Error occurs when running a job directly or uploading a template.
Thanks,
Andrew