For small/medium writes, it should load directly to the table. For larger writes (your case), it writes to multiple temp tables then performs a single copy job [1] that copies their contents to the final table. Afterwards, the sink will clean up all those temp tables. My guess is your pipeline is failing at the copy step. Note what Reuven said in the other thread that Dataflow will retry "indefinitely for streaming", so your pipeline will continue running. You should be able to see error messages in your logs though.
As to why it's failing, we'd have to know more about your use case or see a stack trace. With these things, it's best to submit a support ticket so the engineers can investigate. From my experience though, jobs failing at the copy step are usually because of trying to copy partitioned columns. That isn't supported by BigQuery (see copy job limitations [2] [1] https://cloud.google.com/bigquery/docs/managing-tables#copy-table [2] https://cloud.google.com/bigquery/docs/managing-tables#limitations_on_copying_tables On Thu, Oct 3, 2024 at 11:56 PM [email protected] <[email protected]> wrote: > Hey guys, > > Any help is appreciated. I'm using BigqueryIO file upload method to load > data to BQ, I don't see any error, any warning but I also don't see a > SINGLE row inserted to the table either > > Only thing I see is hundreds of load job like > beam_bq_job_TEMP_TABLE_LOAD_..... > And hundreds of temp table created > > Most jobs are done and I can see the data in temp table, but there is > not a single row written to the final destination? > > I know there is no way to track row level error, but At least the > runner/beam api should give me some hint what is wrong in any steps? And > there is zero document/example about this either. > > > Regards, > >
