ReifyResults step taking hours

Eugene Kirpichov Sun, 04 Mar 2018 16:28:53 -0800

BigQueryIO.write() works by: 1) having Dataflow workers write data to files
(in parallel) 2) asking BigQuery to load those files - naturally, during
this time Dataflow workers aren't doing anything, that's why the job is
scaling down.
These jobs are spending time waiting for BigQuery to load the data.


As for why it's taking so long for BigQuery to load the data: You can try
to look for BigQuery job ids in the Stackdriver logs, and then inspect
these jobs in the BigQuery UI. If it's taking *really* long, it's usually a
quota issue: i.e. your BigQuery jobs are waiting for some other BigQuery
jobs to complete before even starting.

On Sat, Mar 3, 2018 at 3:30 AM Andrew Jones <[email protected]>
wrote:

> Hi,
>
> We have a Dataflow job that loads data from GCS, does a bit of
> transformation, then writes to a number of BigQuery tables using
> DynamicDestinations.
>
> The same job runs on smaller data sets (~70 million records), but this one
> is struggling when processing ~500 million records. Both jobs are writing
> to the same amount of tables - the only difference is the amount of records.
>
> Example job IDs include 2018-03-02_04_29_44-2181786949469858712 and
> 2018-03-02_08_46_28-4580218739500768796. They are using BigQuery.IO to
> write to BigQuery, using the BigQueryIO.Write.Method.FILE_LOADS method (the
> default for a bounded job). They successfully stage all their data to GCS,
> but then for some reason scale down the amount of workers to 1 when
> processing the step WriteTOBigQuery/BatchLoads/ReifyResults and stay in
> that step for hours.
>
> In the logs we see many entries like this:
>
> Proposing dynamic split of work unit
> ...-7e07;2018-03-02_04_29_44-2181786949469858712;662185752552586455 at
> {"fractionConsumed":0.5}
> Rejecting split request because custom reader returned null residual
> source.
> And also occasionally this:
>
> Processing lull for PT24900.038S in state process of
> WriteTOBigQuery/BatchLoads/ReifyResults/ParDo(Anonymous) at
> java.net.SocketInputStream.socketRead0(Native Method) at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
> java.net.SocketInputStream.read(SocketInputStream.java:170) at
> java.net.SocketInputStream.read(SocketInputStream.java:141) ...
>
> The job does seem to eventually progress, but after many hours. It then
> fails later with this error, which may or may not be related (just starting
> to look in to):
>
> (94794e1a2c96f380): java.lang.RuntimeException:
> org.apache.beam.sdk.util.UserCodeException: java.io.IOException: Unable to
> patch table description: {datasetId=..., projectId=...,
> tableId=9c20908cc6e549b4a1e116af54bb8128_011249028ddcc5204885bff04ce2a725_00001_00000},
> aborting after 9 retries.
>
> We're not sure how to proceed, so any pointers would be appreciated.
>
> Thanks,
> Andrew
>

Re: WriteTOBigQuery/BatchLoads/ReifyResults step taking hours

Reply via email to