Yunqing Zhou created BEAM-7983:
----------------------------------
Summary: Template parameters don't work if they are only used in
DoFns
Key: BEAM-7983
URL: https://issues.apache.org/jira/browse/BEAM-7983
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Yunqing Zhou
Assignee: Luke Cwik
Template parameters don't work if they are only used in DoFns but not anywhere
else in main.
Sample pipeline:
{code:java}
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.ValueProvider;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
public class BugPipeline {
public interface Options extends PipelineOptions {
ValueProvider<String> getFoo();
void setFoo(ValueProvider<String> foo);
}
public static void main(String[] args) throws Exception {
Options options = PipelineOptionsFactory.fromArgs(args).as(Options.class);
Pipeline p = Pipeline.create(options);
p.apply(Create.of(1)).apply(ParDo.of(new DoFn<Integer, String>() {
@ProcessElement
public void processElement(ProcessContext context) {
System.out.println(context.getPipelineOptions().as(Options.class).getFoo());
}
}));
p.run();
}
}
{code}
Option "foo" is not used anywhere else than the DoFn. So to reproduce the
problem:
{code:bash}
$java BugPipeline --project=$PROJECT --stagingLocation=$STAGING
--templateLocation=$TEMPLATE --runner=DataflowRunner
$gcloud dataflow jobs run $NAME --gcs-location=$TEMPLATE --parameters=foo=bar
{code}
it will fail w/ this error:
{code}
ERROR: (gcloud.dataflow.jobs.run) INVALID_ARGUMENT: (2621bec26c2488b7): The
workflow could not be created. Causes: (2621bec26c248dba): Found unexpected
parameters: ['foo' (perhaps you meant 'zone')]
- '@type': type.googleapis.com/google.rpc.DebugInfo
detail: "(2621bec26c2488b7): The workflow could not be created. Causes:
(2621bec26c248dba):\
\ Found unexpected parameters: ['foo' (perhaps you meant 'zone')]"
{code}
The underlying problem is that ProxyInvocationHandler.java only populate
options which are "invoked" to the pipeline option map in the job object:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L159
One way to solve it is to save all ValueProvider type of params in the
pipelineoptions section. Alternatively, some registration mechanism can be
introduced.
A current workaround is to annotate the parameter with
{code}@Validation.Required{code}, which will call invoke() behind the scene.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)