Related to a previous thread about custom triggering on GlobalWindows [0],
are there general recommendations for controlling size of output files from
FileIO.Write?

A general pattern I've seen in systems that need to batch individual
records to files is that they offer both a maximum file size and a maximum
latency. If you specify 1 GB and 1 minute respectively, the system would
create multiple 1 GB files per minute when throughput is high, and a single
smaller file per minute when throughput is below 1 GB/minute.

>From the discussion in [0], it sounds like windowing and triggering
semantics are not sufficient to provide such guarantees. Bounded runners
are free to ignore triggers as being non-deterministic. Are there other
techniques I'm missing to limit files sizes, or is windowing on record
timestamp the only tool available that applies to both batch and streaming?

[0]
https://lists.apache.org/thread.html/7b583c73d55d13389a49a35dec2b42128d114361de3c1f0822d9ded4@%3Cuser.beam.apache.org%3E

Reply via email to