Have you checked https://cloud.google.com/dataflow/docs/guides/write-to-bigquery?
autosharding is generally recommended. If the cost is the concern, have you checked STORAGE_API_AT_LEAST_ONCE? On Wed, Oct 2, 2024 at 2:16 PM [email protected] <[email protected]> wrote: > We are trying to process over 150TB data(streaming unbound) per day and > save them to BQ and it looks like storage api is not economical enough for > us. I tried to use file upload but somehow it doesn't work and there are > not many documents for file upload method online. I have a few questions > regarding the file_upload method in streaming mode. > 1. How do I decide numOfFileShards? can I still reply on autosharding? > 2. I noticed the fileloads method requires much more memory, I'm not sure > if dataflow runner would keep all the data in memory before writing to > file? If so even one minute data is too much to be kept in memory and less > than one minute means would exceed the api quota. Is there a way to cap the > memory usage like write data to files before trigger file load job? > 3. I also noticed that if there is a file upload job failure, I don't get > the error message, so what can I do to handle the error, what is the best > practice in terms of error handling in file_upload method? > > Thanks! > Regards, > Siyuan >
