Re: RenameFields behaves differently in DirectRunner

Reuven Lax Thu, 03 Jun 2021 10:00:55 -0700

Correct.

On Thu, Jun 3, 2021 at 9:51 AM Kenneth Knowles <k...@apache.org> wrote:


> I still don't quite grok the details of how this succeeds or fails in
> different situations. The invalid row succeeds in serialization because the
> coder is not sensitive to the way in which it is invalid?
>
> Kenn
>
> On Wed, Jun 2, 2021 at 2:54 PM Brian Hulette <bhule...@google.com> wrote:
>
>> > One thing that's been on the back burner for a long time is making
>> CoderProperties into a CoderTester like Guava's EqualityTester.
>>
>> Reuven's point still applies here though. This issue is not due to a bug
>> in SchemaCoder, it's a problem with the Row we gave SchemaCoder to encode.
>> I'm assuming a CoderTester would require manually generating inputs right?
>> These input Rows represent an illegal state that we wouldn't test with.
>> (That being said I like the idea of a CoderTester in general)
>>
>> Brian
>>
>> On Wed, Jun 2, 2021 at 12:11 PM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> Mutability checking might catch that.
>>>
>>> I meant to suggest not putting the check in the pipeline, but offering a
>>> testing discipline that will catch such issues. One thing that's been on
>>> the back burner for a long time is making CoderProperties into a
>>> CoderTester like Guava's EqualityTester. Then it can run through all the
>>> properties without a user setting up test suites. Downside is that the test
>>> failure signal gets aggregated.
>>>
>>> Kenn
>>>
>>> On Wed, Jun 2, 2021 at 12:09 PM Brian Hulette <bhule...@google.com>
>>> wrote:
>>>
>>>> Could the DirectRunner just do an equality check whenever it does an
>>>> encode/decode? It sounds like it's already effectively performing
>>>> a CoderProperties.coderDecodeEncodeEqual for every element, just omitting
>>>> the equality check.
>>>>
>>>> On Wed, Jun 2, 2021 at 12:04 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> There is no bug in the Coder itself, so that wouldn't catch it. We
>>>>> could insert CoderProperties.coderDecodeEncodeEqual in a subsequent ParDo,
>>>>> but if the Direct runner already does an encode/decode before that ParDo,
>>>>> then that would have fixed the problem before we could see it.
>>>>>
>>>>> On Wed, Jun 2, 2021 at 11:53 AM Kenneth Knowles <k...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Would it be caught by CoderProperties?
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Wed, Jun 2, 2021 at 8:16 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I don't think this bug is schema specific - we created a Java object
>>>>>>> that is inconsistent with its encoded form, which could happen to any
>>>>>>> transform.
>>>>>>>
>>>>>>> This does seem to be a gap in DirectRunner testing though. It also
>>>>>>> makes it hard to test using PAssert, as I believe that puts everything 
>>>>>>> in a
>>>>>>> side input, forcing an encoding/decoding.
>>>>>>>
>>>>>>> On Wed, Jun 2, 2021 at 8:12 AM Brian Hulette <bhule...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +dev <d...@beam.apache.org>
>>>>>>>>
>>>>>>>> > I bet the DirectRunner is encoding and decoding in between, which
>>>>>>>> fixes the object.
>>>>>>>>
>>>>>>>> Do we need better testing of schema-aware (and potentially other
>>>>>>>> built-in) transforms in the face of fusion to root out issues like 
>>>>>>>> this?
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> On Wed, Jun 2, 2021 at 5:13 AM Matthew Ouyang <
>>>>>>>> matthew.ouy...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I have some other work-related things I need to do this week, so I
>>>>>>>>> will likely report back on this over the weekend.  Thank you for the
>>>>>>>>> explanation.  It makes perfect sense now.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 1, 2021 at 11:18 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Some more context - the problem is that RenameFields outputs (in
>>>>>>>>>> this case) Java Row objects that are inconsistent with the actual 
>>>>>>>>>> schema.
>>>>>>>>>> For example if you have the following schema:
>>>>>>>>>>
>>>>>>>>>> Row {
>>>>>>>>>>    field1: Row {
>>>>>>>>>>       field2: string
>>>>>>>>>>     }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> And rename field1.field2 -> renamed, you'll get the following
>>>>>>>>>> schema
>>>>>>>>>>
>>>>>>>>>> Row {
>>>>>>>>>>   field1: Row {
>>>>>>>>>>      renamed: string
>>>>>>>>>>    }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> However the Java object for the _nested_ row will return the old
>>>>>>>>>> schema if getSchema() is called on it. This is because we only 
>>>>>>>>>> update the
>>>>>>>>>> schema on the top-level row.
>>>>>>>>>>
>>>>>>>>>> I think this explains why your test works in the direct runner.
>>>>>>>>>> If the row ever goes through an encode/decode path, it will come back
>>>>>>>>>> correct. The original incorrect Java objects are no longer around, 
>>>>>>>>>> and new
>>>>>>>>>> (consistent) objects are constructed from the raw data and the 
>>>>>>>>>> PCollection
>>>>>>>>>> schema. Dataflow tends to fuse ParDos together, so the following 
>>>>>>>>>> ParDo will
>>>>>>>>>> see the incorrect Row object. I bet the DirectRunner is encoding and
>>>>>>>>>> decoding in between, which fixes the object.
>>>>>>>>>>
>>>>>>>>>> You can validate this theory by forcing a shuffle after
>>>>>>>>>> RenameFields using Reshufflle. It should fix the issue If it does, 
>>>>>>>>>> let me
>>>>>>>>>> know and I'll work on a fix to RenameFields.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 1, 2021 at 7:39 PM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Aha, yes this indeed another bug in the transform. The schema is
>>>>>>>>>>> set on the top-level Row but not on any nested rows.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 1, 2021 at 6:37 PM Matthew Ouyang <
>>>>>>>>>>> matthew.ouy...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you everyone for your input.  I believe it will be
>>>>>>>>>>>> easiest to respond to all feedback in a single message rather than 
>>>>>>>>>>>> messages
>>>>>>>>>>>> per person.
>>>>>>>>>>>>
>>>>>>>>>>>>    - NeedsRunner - The tests are run eventually, so obviously
>>>>>>>>>>>>    all good on my end.  I was trying to run the smallest subset of 
>>>>>>>>>>>> test cases
>>>>>>>>>>>>    possible and didn't venture beyond `gradle test`.
>>>>>>>>>>>>    - Stack Trace - There wasn't any unfortunately because no
>>>>>>>>>>>>    exception thrown in the code.  The Beam Row was translated into 
>>>>>>>>>>>> a BQ
>>>>>>>>>>>>    TableRow and an insertion was attempted.  The error "message" 
>>>>>>>>>>>> was part of
>>>>>>>>>>>>    the response JSON that came back as a result of a request 
>>>>>>>>>>>> against the BQ
>>>>>>>>>>>>    API.
>>>>>>>>>>>>    - Desired Behaviour - (field0_1.field1_0,
>>>>>>>>>>>>    nestedStringField) -> field0_1.nestedStringField is what I am 
>>>>>>>>>>>> looking for.
>>>>>>>>>>>>    - Info Logging Findings (In Lieu of a Stack Trace)
>>>>>>>>>>>>       - The Beam Schema was as expected with all renames
>>>>>>>>>>>>       applied.
>>>>>>>>>>>>       - The example I provided was heavily stripped down in
>>>>>>>>>>>>       order to isolate the problem.  My work example which a bit 
>>>>>>>>>>>> impractical
>>>>>>>>>>>>       because it's part of some generic tooling has 4 levels of 
>>>>>>>>>>>> nesting and also
>>>>>>>>>>>>       produces the correct output too.
>>>>>>>>>>>>       - BigQueryUtils.toTableRow(Row) returns the expected
>>>>>>>>>>>>       TableRow in DirectRunner.  In DataflowRunner however, only 
>>>>>>>>>>>> the top-level
>>>>>>>>>>>>       renames were reflected in the TableRow and all renames in 
>>>>>>>>>>>> the nested fields
>>>>>>>>>>>>       weren't.
>>>>>>>>>>>>       - BigQueryUtils.toTableRow(Row) recurses on the Row
>>>>>>>>>>>>       values and uses the Row.schema to get the field names.  This 
>>>>>>>>>>>> makes sense to
>>>>>>>>>>>>       me, but if a value is actually a Row then its schema appears 
>>>>>>>>>>>> to be
>>>>>>>>>>>>       inconsistent with the top-level schema
>>>>>>>>>>>>    - My Current Workaround - I forked RenameFields and
>>>>>>>>>>>>    replaced the attachValues in expand method to be a "deep" 
>>>>>>>>>>>> rename.  This is
>>>>>>>>>>>>    obviously inefficient and I will not be submitting a PR for 
>>>>>>>>>>>> that.
>>>>>>>>>>>>    - JIRA ticket -
>>>>>>>>>>>>    https://issues.apache.org/jira/browse/BEAM-12442
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 1, 2021 at 5:51 PM Reuven Lax <re...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This transform is the same across all runners. A few comments
>>>>>>>>>>>>> on the test:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - Using attachValues directly is error prone (per the
>>>>>>>>>>>>> comment on the method). I recommend using the withFieldValue
>>>>>>>>>>>>> builders instead.
>>>>>>>>>>>>>   - I recommend capturing the RenameFields PCollection into a
>>>>>>>>>>>>> local variable of type PCollection<Row> and printing out the 
>>>>>>>>>>>>> schema (which
>>>>>>>>>>>>> you can get using the PCollection.getSchema method) to ensure 
>>>>>>>>>>>>> that the
>>>>>>>>>>>>> output schema looks like you expect.
>>>>>>>>>>>>>    - RenameFields doesn't flatten. So renaming
>>>>>>>>>>>>> field0_1.field1_0 - > nestedStringField results in
>>>>>>>>>>>>> field0_1.nestedStringField; if you wanted to flatten, then the 
>>>>>>>>>>>>> better
>>>>>>>>>>>>> transform would be Select.fieldNameAs("field0_1.field1_0",
>>>>>>>>>>>>> nestedStringField).
>>>>>>>>>>>>>
>>>>>>>>>>>>> This all being said, eyeballing the implementation of
>>>>>>>>>>>>> RenameFields makes me think that it is buggy in the case where 
>>>>>>>>>>>>> you specify
>>>>>>>>>>>>> a top-level field multiple times like you do. I think it is simply
>>>>>>>>>>>>> adding the top-level field into the output schema multiple times, 
>>>>>>>>>>>>> and the
>>>>>>>>>>>>> second time is with the field0_1 base name; I have no idea why 
>>>>>>>>>>>>> your test
>>>>>>>>>>>>> doesn't catch this in the DirectRunner, as it's equally broken 
>>>>>>>>>>>>> there. Could
>>>>>>>>>>>>> you file a JIRA about this issue and assign it to me?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reuven
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 1, 2021 at 12:47 PM Kenneth Knowles <
>>>>>>>>>>>>> k...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 1, 2021 at 12:42 PM Brian Hulette <
>>>>>>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Matthew,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > The unit tests also seem to be disabled for this as well
>>>>>>>>>>>>>>> and so I don’t know if the PTransform behaves as expected.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The exclusion for NeedsRunner tests is just a quirk in our
>>>>>>>>>>>>>>> testing framework. NeedsRunner indicates that a test suite 
>>>>>>>>>>>>>>> can't be
>>>>>>>>>>>>>>> executed with the SDK alone, it needs a runner. So that 
>>>>>>>>>>>>>>> exclusion just
>>>>>>>>>>>>>>> makes sure we don't run the test when we're verifying the SDK 
>>>>>>>>>>>>>>> by itself in
>>>>>>>>>>>>>>> the :sdks:java:core:test task. The test is still run in other 
>>>>>>>>>>>>>>> tasks where
>>>>>>>>>>>>>>> we have a runner, most notably in the Java PreCommit [1], where 
>>>>>>>>>>>>>>> we run it
>>>>>>>>>>>>>>> as part of the :runners:direct-java:test task.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That being said, we may only run these tests continuously
>>>>>>>>>>>>>>> with the DirectRunner, I'm not sure if we test them on all the 
>>>>>>>>>>>>>>> runners like
>>>>>>>>>>>>>>> we do with ValidatesRunner tests.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That is correct. The tests are tests _of the transform_ so
>>>>>>>>>>>>>> they run only on the DirectRunner. They are not tests of the 
>>>>>>>>>>>>>> runner, which
>>>>>>>>>>>>>> is only responsible for correctly implementing Beam's 
>>>>>>>>>>>>>> primitives. The
>>>>>>>>>>>>>> transform should not behave differently on different runners, 
>>>>>>>>>>>>>> except for
>>>>>>>>>>>>>> fundamental differences in how they schedule work and checkpoint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > The error message I’m receiving, : Error while reading
>>>>>>>>>>>>>>> data, error message: JSON parsing error in row starting at 
>>>>>>>>>>>>>>> position 0: No
>>>>>>>>>>>>>>> such field: nestedField.field1_0, suggests the BigQuery is
>>>>>>>>>>>>>>> trying to use the original name for the nested field and not 
>>>>>>>>>>>>>>> the substitute
>>>>>>>>>>>>>>> name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there a stacktrace associated with this error? It would
>>>>>>>>>>>>>>> be helpful to see where the error is coming from.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/4101/testReport/org.apache.beam.sdk.schemas.transforms/RenameFieldsTest/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, May 31, 2021 at 5:02 PM Matthew Ouyang <
>>>>>>>>>>>>>>> matthew.ouy...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I’m trying to use the RenameFields transform prior to
>>>>>>>>>>>>>>>> inserting into BigQuery on nested fields.  Insertion into 
>>>>>>>>>>>>>>>> BigQuery is
>>>>>>>>>>>>>>>> successful with DirectRunner, but DataflowRunner has an issue 
>>>>>>>>>>>>>>>> with renamed
>>>>>>>>>>>>>>>> nested fields  The error message I’m receiving, : Error
>>>>>>>>>>>>>>>> while reading data, error message: JSON parsing error in row 
>>>>>>>>>>>>>>>> starting at
>>>>>>>>>>>>>>>> position 0: No such field: nestedField.field1_0, suggests
>>>>>>>>>>>>>>>> the BigQuery is trying to use the original name for the nested 
>>>>>>>>>>>>>>>> field and
>>>>>>>>>>>>>>>> not the substitute name.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The code for RenameFields seems simple enough but does it
>>>>>>>>>>>>>>>> behave differently in different runners?  Will a deep 
>>>>>>>>>>>>>>>> attachValues be
>>>>>>>>>>>>>>>> necessary in order get the nested renames to work across all 
>>>>>>>>>>>>>>>> runners? Is
>>>>>>>>>>>>>>>> there something wrong in my code?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/RenameFields.java#L186
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The unit tests also seem to be disabled for this as well
>>>>>>>>>>>>>>>> and so I don’t know if the PTransform behaves as expected.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/build.gradle#L67
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/RenameFieldsTest.java
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> package ca.loblaw.cerebro.PipelineControl;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>>> com.google.api.services.bigquery.model.TableReference;
>>>>>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>>> org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
>>>>>>>>>>>>>>>>> ;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.Pipeline;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.options.PipelineOptionsFactory;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.schemas.Schema;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.schemas.transforms.RenameFields
>>>>>>>>>>>>>>>>> ;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.transforms.Create;
>>>>>>>>>>>>>>>>> import org.apache.beam.sdk.values.Row;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> import java.io.File;
>>>>>>>>>>>>>>>>> import java.util.Arrays;
>>>>>>>>>>>>>>>>> import java.util.HashSet;
>>>>>>>>>>>>>>>>> import java.util.stream.Collectors;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> import static java.util.Arrays.*asList*;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> public class BQRenameFields {
>>>>>>>>>>>>>>>>>     public static void main(String[] args) {
>>>>>>>>>>>>>>>>>         PipelineOptionsFactory.*register*(
>>>>>>>>>>>>>>>>> DataflowPipelineOptions.class);
>>>>>>>>>>>>>>>>>         DataflowPipelineOptions options =
>>>>>>>>>>>>>>>>> PipelineOptionsFactory.*fromArgs*(args).as(
>>>>>>>>>>>>>>>>> DataflowPipelineOptions.class);
>>>>>>>>>>>>>>>>>         options.setFilesToStage(
>>>>>>>>>>>>>>>>>                 Arrays.*stream*(System.*getProperty*(
>>>>>>>>>>>>>>>>> "java.class.path").
>>>>>>>>>>>>>>>>>                         split(File.*pathSeparator*)).
>>>>>>>>>>>>>>>>>                         map(entry -> (new
>>>>>>>>>>>>>>>>> File(entry)).toString()).collect(Collectors.*toList*()));
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         Pipeline pipeline = Pipeline.*create*(options);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         Schema nestedSchema = Schema.*builder*().addField(
>>>>>>>>>>>>>>>>> Schema.Field.*nullable*("field1_0", Schema.FieldType.
>>>>>>>>>>>>>>>>> *STRING*)).build();
>>>>>>>>>>>>>>>>>         Schema.Field field = Schema.Field.*nullable*(
>>>>>>>>>>>>>>>>> "field0_0", Schema.FieldType.*STRING*);
>>>>>>>>>>>>>>>>>         Schema.Field nested = Schema.Field.*nullable*(
>>>>>>>>>>>>>>>>> "field0_1", Schema.FieldType.*row*(nestedSchema));
>>>>>>>>>>>>>>>>>         Schema.Field runner = Schema.Field.*nullable*(
>>>>>>>>>>>>>>>>> "field0_2", Schema.FieldType.*STRING*);
>>>>>>>>>>>>>>>>>         Schema rowSchema = Schema.*builder*()
>>>>>>>>>>>>>>>>>                 .addFields(field, nested, runner)
>>>>>>>>>>>>>>>>>                 .build();
>>>>>>>>>>>>>>>>>         Row testRow = Row.*withSchema*(rowSchema
>>>>>>>>>>>>>>>>> ).attachValues("value0_0", Row.*withSchema*(nestedSchema
>>>>>>>>>>>>>>>>> ).attachValues("value1_0"), options
>>>>>>>>>>>>>>>>> .getRunner().toString());
>>>>>>>>>>>>>>>>>         pipeline
>>>>>>>>>>>>>>>>>                 .apply(Create.*of*(testRow).withRowSchema(
>>>>>>>>>>>>>>>>> rowSchema))
>>>>>>>>>>>>>>>>>                 .apply(RenameFields.<Row>*create*()
>>>>>>>>>>>>>>>>>                         .rename("field0_0", "stringField")
>>>>>>>>>>>>>>>>>                         .rename("field0_1", "nestedField")
>>>>>>>>>>>>>>>>>                         .rename("field0_1.field1_0",
>>>>>>>>>>>>>>>>> "nestedStringField")
>>>>>>>>>>>>>>>>>                         .rename("field0_2", "runner"))
>>>>>>>>>>>>>>>>>                 .apply(BigQueryIO.<Row>*write*()
>>>>>>>>>>>>>>>>>                         .to(new
>>>>>>>>>>>>>>>>> TableReference().setProjectId("lt-dia-lake-exp-raw"
>>>>>>>>>>>>>>>>> ).setDatasetId("prototypes").setTableId(
>>>>>>>>>>>>>>>>> "matto_renameFields"))
>>>>>>>>>>>>>>>>>                         .withCreateDisposition(BigQueryIO.
>>>>>>>>>>>>>>>>> Write.CreateDisposition.*CREATE_IF_NEEDED*)
>>>>>>>>>>>>>>>>>                         .withWriteDisposition(BigQueryIO.
>>>>>>>>>>>>>>>>> Write.WriteDisposition.*WRITE_APPEND*)
>>>>>>>>>>>>>>>>>                         .withSchemaUpdateOptions(new
>>>>>>>>>>>>>>>>> HashSet<>(*asList*(BigQueryIO.Write.SchemaUpdateOption.
>>>>>>>>>>>>>>>>> *ALLOW_FIELD_ADDITION*)))
>>>>>>>>>>>>>>>>>                         .useBeamSchema());
>>>>>>>>>>>>>>>>>         pipeline.run();
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: RenameFields behaves differently in DirectRunner

Reply via email to