For small/medium writes, it should load directly to the table.

For larger writes (your case), it writes to multiple temp tables then
performs a single copy job [1] that copies their contents to the final
table. Afterwards, the sink will clean up all those temp tables.
My guess is your pipeline is failing at the copy step. Note what Reuven
said in the other thread that Dataflow will retry "indefinitely for
streaming", so your pipeline will continue running. You should be able to
see error messages in your logs though.

As to why it's failing, we'd have to know more about your use case or see a
stack trace. With these things, it's best to submit a support ticket so the
engineers can investigate. From my experience though, jobs failing at the
copy step are usually because of trying to copy partitioned columns. That
isn't supported by BigQuery (see copy job limitations [2]

[1] https://cloud.google.com/bigquery/docs/managing-tables#copy-table
[2]
https://cloud.google.com/bigquery/docs/managing-tables#limitations_on_copying_tables

On Thu, Oct 3, 2024 at 11:56 PM [email protected] <[email protected]> wrote:

> Hey guys,
>
> Any help is appreciated. I'm using BigqueryIO file upload method to load
> data to BQ, I don't see any error, any warning but I also don't see a
> SINGLE row inserted to the table either
>
> Only thing I see is hundreds of load job like
> beam_bq_job_TEMP_TABLE_LOAD_.....
> And hundreds of temp table created
>
> Most jobs are done and I can see the data in temp table, but there is
> not a single row written to the final destination?
>
> I know there is no way to track row level error, but At least the
> runner/beam api should give me some hint what is wrong in any steps? And
> there is zero document/example about this either.
>
>
> Regards,
>
>

Reply via email to