We are trying to process over 150TB data(streaming unbound) per day and save them to BQ and it looks like storage api is not economical enough for us. I tried to use file upload but somehow it doesn't work and there are not many documents for file upload method online. I have a few questions regarding the file_upload method in streaming mode. 1. How do I decide numOfFileShards? can I still reply on autosharding? 2. I noticed the fileloads method requires much more memory, I'm not sure if dataflow runner would keep all the data in memory before writing to file? If so even one minute data is too much to be kept in memory and less than one minute means would exceed the api quota. Is there a way to cap the memory usage like write data to files before trigger file load job? 3. I also noticed that if there is a file upload job failure, I don't get the error message, so what can I do to handle the error, what is the best practice in terms of error handling in file_upload method?
Thanks! Regards, Siyuan
