Thanks for starting this thread Jan, I'm keen to hear thoughts and outcomes! I thought I would mention that answers to the questions posed here will help to unblock a 2.38.0 release blocker[1].
[1] https://issues.apache.org/jira/browse/BEAM-14064 On Thu, Mar 24, 2022 at 5:28 AM Jan Lukavský <je...@seznam.cz> wrote: > Hi, > > this is follow-up thread started from [1]. In the thread there is > mentioned multiple times that (in stateless ParDo), the output watermark is > allowed to advance only on bundle boundaries [2]. Essentially that would > mean that anything in between calls to @StartBundle and @FinishBundle would > be processed in single instant in (output) event-time. This makes perfect > sense. > > The issue is that it seems that not all runners actually implement this > behavior. FlinkRunner for instance does not have a "natural" concept of > bundles and those are created in a more ad-hoc way to adhere with the DoFn > life-cycle (see [3]). Watermark updates and elements are completely > interleaved without any synchronization with bundle "open" or "close". If > watermark updates are allowed to happen only on boundaries of bundles, then > this seems to break this contract. > > The question therefore is - should we consider FlinkRunner as > non-compliant with this aspect of the Apache Beam model or is this an > "optional" part that runners are free to implement at will? In the case of > the former, do we miss some @ValidatesRunner tests for this? > > Jan > > [1] https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db > > [2] https://lists.apache.org/thread/7foy455spg43xo77zhrs62gc1m383t50 > > [3] > https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L786 > > >