Hi everyone, I have some questions I want to ask about how windowing, triggering, and panes work together, and how to ensure correctness throughout a pipeline.
Lets assume I have a very simple streaming pipeline that looks like: Source -> CombineByKey (Sum) -> Sink Given fixed windows of 1 hour, early firings every minute, and accumulating panes, this is pretty straight forward. However, this can get more complicated if we add steps after the CombineByKey, for instance (using the same windowing strategy): Say I want to buffer the results of the CombineByKey into batches of N elements. I can do this with the built-in GroupIntoBatches [1] transform, now my pipeline looks like: Source -> CombineByKey (Sum) -> GroupIntoBatches -> Sink *This leads to my main question:* Is ordering preserved somehow here? ie: is it possible that the result from early firing F+1 now comes BEFORE the firing F (because it was re-ordered in the GroupIntoBatches). This would mean that the sink then gets F+1 before F, which means my resulting store has incorrect data (possibly forever if F+1 was the final firing). If ordering is not preserved, it seems as if I'd need to introduce my own ordering back in after GroupIntoBatches. GIB is an example here, but I imagine this could happen with any GBK type operation. Am I thinking about this the correct way? Thanks! [1] https://beam.apache.org/releases/javadoc/2.10.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html
