Yeah, I found those failed job but none of them records why it fail and `bq
show` gives me job not found

On Thu, Oct 3, 2024 at 2:33 PM Ahmed Abualsaud <[email protected]>
wrote:

> I'd check your Dataflow worker logs and look for any messages about
> `beam_bq_job_COPY`
>
> On Fri, Oct 4, 2024 at 12:31 AM [email protected] <[email protected]> wrote:
>
>> And interestingly in bigquery UI I only see beam_bq_job_LOAD not beam_bq_
>> job_COPY, but the job id did show up in logs
>>
>> On Thu, Oct 3, 2024 at 2:28 PM [email protected] <[email protected]> wrote:
>>
>>> Yes I figured out the above from reading source code again. I hope the
>>> steps can be documented somewhere in beam
>>> But I still can not find the details for those jobs
>>> For example
>>> bq show -j --format=prettyjson --project_id=.... beam_bq_job_COPY_
>>> gives me
>>> BigQuery error in show operation: Not found: Job project-data
>>>
>>> On Thu, Oct 3, 2024 at 2:17 PM Ahmed Abualsaud via user <
>>> [email protected]> wrote:
>>>
>>>> For small/medium writes, it should load directly to the table.
>>>>
>>>> For larger writes (your case), it writes to multiple temp tables then
>>>> performs a single copy job [1] that copies their contents to the final
>>>> table. Afterwards, the sink will clean up all those temp tables.
>>>> My guess is your pipeline is failing at the copy step. Note what Reuven
>>>> said in the other thread that Dataflow will retry "indefinitely for
>>>> streaming", so your pipeline will continue running. You should be able to
>>>> see error messages in your logs though.
>>>>
>>>> As to why it's failing, we'd have to know more about your use case or
>>>> see a stack trace. With these things, it's best to submit a support ticket
>>>> so the engineers can investigate. From my experience though, jobs failing
>>>> at the copy step are usually because of trying to copy partitioned columns.
>>>> That isn't supported by BigQuery (see copy job limitations [2]
>>>>
>>>> [1] https://cloud.google.com/bigquery/docs/managing-tables#copy-table
>>>> [2]
>>>> https://cloud.google.com/bigquery/docs/managing-tables#limitations_on_copying_tables
>>>>
>>>> On Thu, Oct 3, 2024 at 11:56 PM [email protected] <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey guys,
>>>>>
>>>>> Any help is appreciated. I'm using BigqueryIO file upload method to
>>>>> load data to BQ, I don't see any error, any warning but I also don't see a
>>>>> SINGLE row inserted to the table either
>>>>>
>>>>> Only thing I see is hundreds of load job like
>>>>> beam_bq_job_TEMP_TABLE_LOAD_.....
>>>>> And hundreds of temp table created
>>>>>
>>>>> Most jobs are done and I can see the data in temp table, but there is
>>>>> not a single row written to the final destination?
>>>>>
>>>>> I know there is no way to track row level error, but At least the
>>>>> runner/beam api should give me some hint what is wrong in any steps? And
>>>>> there is zero document/example about this either.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>

Reply via email to