Thanks for starting this thread Jan, I'm keen to hear thoughts and
outcomes!  I thought I would mention that answers to the questions posed
here will help to unblock a 2.38.0 release blocker[1].

[1] https://issues.apache.org/jira/browse/BEAM-14064

On Thu, Mar 24, 2022 at 5:28 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> this is follow-up thread started from [1]. In the thread there is
> mentioned multiple times that (in stateless ParDo), the output watermark is
> allowed to advance only on bundle boundaries [2]. Essentially that would
> mean that anything in between calls to @StartBundle and @FinishBundle would
> be processed in single instant in (output) event-time. This makes perfect
> sense.
>
> The issue is that it seems that not all runners actually implement this
> behavior. FlinkRunner for instance does not have a "natural" concept of
> bundles and those are created in a more ad-hoc way to adhere with the DoFn
> life-cycle (see [3]). Watermark updates and elements are completely
> interleaved without any synchronization with bundle "open" or "close". If
> watermark updates are allowed to happen only on boundaries of bundles, then
> this seems to break this contract.
>
> The question therefore is - should we consider FlinkRunner as
> non-compliant with this aspect of the Apache Beam model or is this an
> "optional" part that runners are free to implement at will? In the case of
> the former, do we miss some @ValidatesRunner tests for this?
>
>  Jan
>
> [1] https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
>
> [2] https://lists.apache.org/thread/7foy455spg43xo77zhrs62gc1m383t50
>
> [3]
> https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L786
>
>
>

Reply via email to