Greetings,

When playing around with the following simple event-time stream aggregation:

      SELECT
        TUMBLE_START(event_time, INTERVAL '1' DAY) as event_time,
        ...
      FROM input
      GROUP BY TUMBLE(event_time, INTERVAL '1' DAY), symbol

...to my surprise I found out that the tumbling window operator has no
effect on the watermarks of the resulting append stream - the watermarks of
the input stream are propagated as-is.

This seems to be a documented behavior
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#interaction-of-watermarks-and-windows
but
it's still very counter-intuitive to me and I couldn't find any explanation
of it.

My understanding of the watermarking is that MOST data is expected to
arrive with event time below the stream's watermark. Late events are either
discarded or should be handled as exceptional cases, e.g. via "allowed
lateness".

So in my aggregation above I was expecting the result watermark to be
offset by ~1 day from the input and be emitted only after a tumbling window
closes. Instead, with input watermarks propagated as-is ALL events in the
resulting stream end up being late in relation to the current watermark...
Doesn't this behavior ruin the composition, as downstream operators will be
discarding all late data?

I'd greatly appreciate if someone could shed light on this design decision.

Thanks,
Sergii

Reply via email to