Hi Lina, thanks for confirming that you are using write_truncate - that's a bummer as it's not possible to truncate with storage_write_api, it's append only. As it is a batch I imagine your job could have a step where you truncate the table to overcome this limitation (e.g. via DML). Have in mind that storage write api has 2 methods - at least once and exactly once. For exactly once I would recommend control amount of streams - withNumStorageWriteApiStreams as you may deplete the quota quickly. Storage Write API is using cross language transform - for dataflow it will work right away - let me know how you run your pipeline.
Radek On Wed, Mar 19, 2025 at 6:12 PM Lina Mårtensson <lina@camus.energy> wrote: > Thanks Radek! > > I didn't realize that writing is done with a copy job - then I understand > why we need to configure partitioning as well. And that all makes sense. > > We haven't tried the storage write API - that wasn't available for Python > yet when we started doing this. I will take a look at it and see if that > might work for us and get back to you. Would using the storage write API > allow us to write to a partitioned table without specifying so? > I should mention as well that streaming the data would be out of the > question, since we need to use mutating DML statements as well. > > Our call looks like this: > "Write to BigQuery" >> WriteToBigQuery( > > project=self.project_id, > > table=self.get_table_name, > > dataset=self.dataset, > schema=self.get_schema, > > schema_side_inputs=(beam.pvalue.AsDict(types_for_schema),), > > create_disposition=BigQueryDisposition.CREATE_IF_NEEDED, > write_disposition=( > > BigQueryDisposition.WRITE_TRUNCATE > > if self.overwrite > > else BigQueryDisposition.WRITE_APPEND > > ), > additional_bq_parameters={ > "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION"], > } > ) > > On Wed, Mar 19, 2025 at 3:12 AM Radek Stankiewicz <radosl...@google.com> > wrote: > >> hi Lina, >> there are multiple reasons why copy job is used with temporary table; >> - you may be using dynamic destinations >> - you are loading lots of data, probably with truncate >> This way we ensure atomicity as we can trigger copy from multiple temp >> tables into one final table. >> Can you confirm or paste a snippet how you configured >> pache_beam.io.gcp.bigquery.WriteToBigQuery ? >> >> BigQuery doesn't allow copying non partitioned tables into partitioned >> tables - it's a BQ limitation. >> >> have you tried other loading methods .e.g storage write api? >> >> Radek >> >> >> >> >> >> >> >> On Wed, Mar 19, 2025 at 5:42 AM Lina Mårtensson via user < >> user@beam.apache.org> wrote: >> >>> Hi, >>> >>> We have, by now, a large set of different Beam jobs all written in >>> Python that all write to a set of BigQuery tables that more or less behave >>> the same way in a single dataset. These tables aren't partitioned at all, >>> but going forward, we need them to be. >>> >>> I partitioned a single table to start with, and was very surprised to >>> find that a Beam job that wrote to it couldn't do so: >>> >>> Failed to copy Non partitioned table to Column partitioned table: not >>> supported. >>> >>> >>> We have a bunch of pre-created tables, and I would've thought I could >>> just keep writing to those without changing settings even after setting up >>> partitioning on them. It doesn't seem to matter whether the >>> create_disposition is CREATE_IF_NEEDED or CREATE_NEVER. >>> It does work when I set the additional_bq_parameters to add >>> time_partitioning, but it would be a huge undertaking not only to >>> update all of our currently running jobs across many projects, but also to >>> make sure to synchronize these changes with updating the underlying >>> BigQuery tables. And it doesn't seem like it should be necessary to specify >>> if we're not creating any new tables? >>> >>> Is there any way to just write the data we have to pre-created, >>> partitioned tables without having to set time_partitioning in >>> additional_bq_parameters, or potentially if there's some other >>> recommended way to solve this problem? >>> >>> Thanks, >>> -Lina >>> >>