File load does not return per-row errors (unlike storage API which does). Dataflow will generally retry the entire file load on error (indefinitely for streaming and up to 3 times for batch). You can look at the logs to find the specific error, however it can be tricky to associate it with a specific row.
Reuven On Wed, Oct 2, 2024 at 1:08 PM [email protected] <[email protected]> wrote: > Any best practice for error handling for file upload job? > > On Wed, Oct 2, 2024 at 1:04 PM [email protected] <[email protected]> wrote: > >> STORAGE_API_AT_LEAST_ONCE only saves dataflow engine cost, but the >> storage api cost alone is too high for us, that's why we want to switch to >> file upload >> >> On Wed, Oct 2, 2024 at 12:08 PM XQ Hu via user <[email protected]> >> wrote: >> >>> Have you checked >>> https://cloud.google.com/dataflow/docs/guides/write-to-bigquery? >>> >>> autosharding is generally recommended. If the cost is the concern, have >>> you checked STORAGE_API_AT_LEAST_ONCE? >>> >>> On Wed, Oct 2, 2024 at 2:16 PM [email protected] <[email protected]> >>> wrote: >>> >>>> We are trying to process over 150TB data(streaming unbound) per day and >>>> save them to BQ and it looks like storage api is not economical enough for >>>> us. I tried to use file upload but somehow it doesn't work and there are >>>> not many documents for file upload method online. I have a few questions >>>> regarding the file_upload method in streaming mode. >>>> 1. How do I decide numOfFileShards? can I still reply on autosharding? >>>> 2. I noticed the fileloads method requires much more memory, I'm not >>>> sure if dataflow runner would keep all the data in memory before writing to >>>> file? If so even one minute data is too much to be kept in memory and less >>>> than one minute means would exceed the api quota. Is there a way to cap the >>>> memory usage like write data to files before trigger file load job? >>>> 3. I also noticed that if there is a file upload job failure, I don't >>>> get the error message, so what can I do to handle the error, what is the >>>> best practice in terms of error handling in file_upload method? >>>> >>>> Thanks! >>>> Regards, >>>> Siyuan >>>> >>>
