Thanks, so autosharding is recommended for file upload method as well? On Wed, Oct 2, 2024 at 12:20 PM Reuven Lax via user <[email protected]> wrote:
> All data is not kept in memory. However if you have too few shards and > writing the files is slow, data has to be in memory while the file write is > in process. > > On Wed, Oct 2, 2024 at 11:16 AM [email protected] <[email protected]> wrote: > >> We are trying to process over 150TB data(streaming unbound) per day and >> save them to BQ and it looks like storage api is not economical enough for >> us. I tried to use file upload but somehow it doesn't work and there are >> not many documents for file upload method online. I have a few questions >> regarding the file_upload method in streaming mode. >> 1. How do I decide numOfFileShards? can I still reply on autosharding? >> 2. I noticed the fileloads method requires much more memory, I'm not sure >> if dataflow runner would keep all the data in memory before writing to >> file? If so even one minute data is too much to be kept in memory and less >> than one minute means would exceed the api quota. Is there a way to cap the >> memory usage like write data to files before trigger file load job? >> 3. I also noticed that if there is a file upload job failure, I don't get >> the error message, so what can I do to handle the error, what is the best >> practice in terms of error handling in file_upload method? >> >> Thanks! >> Regards, >> Siyuan >> >
