[ https://issues.apache.org/jira/browse/BEAM-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles reassigned BEAM-12725: -------------------------------------- Assignee: Christopher Cornwell > BigQuery FILE_LOADS fails with ALLOW_FIELD_ADDITION set > ------------------------------------------------------- > > Key: BEAM-12725 > URL: https://issues.apache.org/jira/browse/BEAM-12725 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Affects Versions: 2.29.0 > Reporter: Christopher Cornwell > Assignee: Christopher Cornwell > Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > While running a job on Dataflow that writes to BigQuery using the > `FILE_LOADS` write method I notice the following error in the > `MultiPartitionsWriteTables` step: > > {code:java} > {"errorResult":\{"message":"Schema update options should only be specified > with WRITE_APPEND disposition, or with WRITE_TRUNCATE disposition on a table > partition.","reason":"invalid"},"errors":[\{"message":"Schema update options > should only be specified with WRITE_APPEND disposition, or with > WRITE_TRUNCATE disposition on a table > partition.","reason":"invalid"}],"state":"DONE"} > {code} > > Here's the write configuration that I'm using: > {code:java} > BigQueryIO > .write() > .to(...) > .withSchema(...) > .withFormatFunction(...) > .withCreateDisposition(CREATE_IF_NEEDED) > .withWriteDisposition(WRITE_APPEND) > .withSchemaUpdateOptions(Collections.singleton(SchemaUpdateOption.ALLOW_FIELD_ADDITION)) > .withTimePartitioning(new > TimePartitioning().setType("DAY").setRequirePartitionFilter(false).setField("ts")) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(Minutes.minutes(5).toStandardDuration) > .withAutoSharding() > .optimizedWrites(){code} > > I believe it is due to the fact that the schema update options are being > passed to the `WriteTables` constructor for the temp tables > [here|https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L610]. > It might be okay to just pass `null` there instead since I don't think we > need the schema update options if we're always generating those temp tables > from scratch, but I'm not sure if that will have other consequences. > This is preventing any of the load jobs from completing, causing none of the > data to ever make it to the BigQuery table. -- This message was sent by Atlassian Jira (v8.3.4#803005)