Hi there, I imagine the answer to this question might depend on the underlying runner, but simply put: can I write files, temporarily, to disk? I'm currently using the DataflowRunner if that's a helpful detail.
Relatedly, how does Beam handle large files? Say that my pipeline reads files from a distributed file system like AWS S3 or GCP Cloud Storage. If a file is 10 GB and I read its contents, those contents will be held in memory, correct? As a somewhat contrived example, what would be the recommended approach if I wanted to read a set of large files, tar them, and upload them elsewhere? Thanks! Evan