Dataflow runner v1 vs. v2 for Java pipelines

Marcin Kuthan Wed, 04 May 2022 23:40:10 -0700

Hi

I experimented a bit with Dataflow Runner v2 and I do not see a strong
advantage for pure Java pipelines. I would like to hear the voices from
more experienced Dataflow Runner v2 users:


* Do you run your production Java pipelines using Dataflow Runner V2?
* Streaming or batch?
* Custom containers or default ones?
* Have you observed any regression after migration from Dataflow Runner V1?

Finally, the WordCount example (Beam 2.38) does not work if only the
following parameter is set "--dataflowServiceOptions=use_runner_v2".

java.lang.RuntimeException: Failed to create a workflow job: Dataflow
Runner v2 requires a valid FnApi job, Please resubmit your job with a valid
configuration. Note that if using Templates, you may need to regenerate
your template with the '--use_runner_v2'.
    at org.apache.beam.runners.dataflow.DataflowRunner.run
(DataflowRunner.java:1330)
    at org.apache.beam.runners.dataflow.DataflowRunner.run
(DataflowRunner.java:196)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:323)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:309)
    at org.apache.beam.examples.WordCount.runWordCount (WordCount.java:196)
    at org.apache.beam.examples.WordCount.main (WordCount.java:203)

I'm able to run the job with "--experiments=use_runner_v2" option and it
seems that runner v2 is enabled. The steps are different, especially for
the reader part of the pipeline.
If I specify both options "--dataflowServiceOptions=use_runner_v2" and
"--experiments=use_runner_v2" the job looks identical to the previous one,
so the dataflowServiceOptions seems to be redundant. But the official
documentation says clearly to use the dataflowServiceOptions option to
enable Dataflow Runner v2:
https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2
.

Thanks for sharing your thoughts
Marcin Kuthan

Dataflow runner v1 vs. v2 for Java pipelines

Reply via email to