Re: [DISCUSS] FLIP-XXX: Support Window Stagger in FlinkSQL and introduce KEY_BASED deterministic stagger

zihao chen Tue, 09 Jun 2026 06:03:40 -0700

Hi Martijn,

Thanks for the detailed feedback.


I would like to share a concrete use case that may help explain why
per-key window boundaries can be an intentional semantic choice rather
than merely a side effect of load smoothing.

In our real scene, a workload that performs windowed aggregation per
ad_id and forwards the aggregated results to a downstream sample service.
With a regular tumbling window, all keys produce results at the same window
boundary, creating periodic traffic spikes. To smooth the output,
users would like different keys to have deterministic offsets derived
from the key hash.

An important characteristic of this workload is that the downstream
consumer processes aggregates independently per key:
- No comparison between different keys is required;
- No cross-key joins are involved;
- No requirement exists for all keys to share identical window
boundaries.

In other words, for aggregations such as GROUP BY ad_id, aligned
boundaries across different keys are not a business requirement.

What users care about is that:
- Each key still has fixed-size, non-overlapping windows;
- Window assignment remains deterministic across restarts and
reprocessing;
- Aggregation results are distributed more evenly over time.

>From the user's perspective, these properties are sufficient and
preserve the intended semantics of the workload.

Regarding implementation alternatives, a similar smoothing effect can
be achieved by delaying emissions through custom triggers. However,
this typically requires extending the window lifecycle (e.g. through
additional state retention and allowed lateness) to ensure state is not
cleaned up before delayed emission. In contrast, shifting window
boundaries preserves the normal window lifecycle and does not require
additional state retention.

Based on your feedback, I am inclined to narrow the proposal to the
deterministic KEY_BASED case only and remove RANDOM and NATURAL
from the scope.

The question would then become whether SQL should expose a
deterministic per-key window semantic that is useful for certain
workloads, rather than whether SQL should support non-deterministic
staggering strategies.

Looking forward to hearing your thoughts.

Best regards,
Zihao

Martijn Visser <[email protected]> 于2026年6月8日周一 20:28写道：

> Hi Zihao,
>
> Apologies for the late reply. My biggest concern is still around the
> non-deterministic behavior. For RANDOM and NATURAL that's exactly what
> they introduce, and I'm convinced we should be moving away from that,
> not adding more of it: non-determinism makes stream processing so much
> harder to reason about for a user. It causes all sorts of issues
> around reprocessing, it breaks changelog processing semantics, and it
> makes the entire NonDeterministicUpdate analysis (even more)
> complicated.
>
> KEY_BASED is deterministic so that argument doesn't apply, but it
> still changes the result set with per-key boundaries, and I haven't
> seen a concrete use case that actually wants that rather than it being
> a side effect of load smoothing. I also don't think the global-offset
> analogy holds here. A global offset shifts every window by the same
> constant, so all keys keep identical boundaries and the result stays
> deterministic and comparable across keys. A per-key stagger gives
> different keys different boundaries, which is precisely the property a
> global offset preserves and this does not. "Fixed-size and
> non-overlapping" holds per key in both cases, but that's not the part
> that's at stake.
>
> That's also why I'd rather see the "future work" you mentioned:
> stagger the emission while keeping the boundaries aligned, which keeps
> the results stable and is a real physical optimization.
>
> As a side note, FLINK-37655 proposes making WindowStagger pluggable at
> the DataStream level and only has a single watcher, which makes me
> question how much demand there is for this in the first place.
>
> Best regards,
> Martijn
>
> Op ma 1 jun 2026 om 10:27 schreef zihao chen <[email protected]>:
> >
> > Hi Martijn,
> >
> > I hope you are doing well.
> >
> > I wanted to follow up on the revised proposal for STAGGER_TUMBLE
> > that I shared last week. I am particularly interested to hear whether
> this
> > updated direction addresses the concerns you raised about mixing
> > physical concerns with logical semantics.
> >
> > Your feedback would be greatly appreciated. If you have any additional
> > thoughts or suggestions, I would be happy to incorporate them.
> >
> > Thank you very much for your time and guidance.
> >
> > Best regards,
> > Zihao
> >
> > Feng Jin <[email protected]> 于2026年5月25日周一 18:58写道：
> >
> > > Hi Zihao, Martijn,
> > >
> > >
> > > +1 for introducing a new window type, as this is not a change to the
> > > trigger mechanism itself, but rather a fundamental redefinition of how
> data
> > > is partitioned into windows.
> > >
> > >
> > > Best,
> > >
> > > Feng
> > >
> > >
> > >
> > >
> > >
> > > On Sat, May 23, 2026 at 12:07 PM zihao chen <[email protected]>
> wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Thanks for your insightful feedback and careful review.
> > > >
> > > > Your point about avoiding the mixing of physical concerns with
> > > > logical semantics makes perfect sense, and it prompted me to rethink
> > > > the design more thoroughly.
> > > >
> > > > I would like to share an updated direction below and see whether this
> > > > aligns better with your expectations.
> > > > 1. Original Proposal — Withdrawn
> > > >
> > > > I initially proposed extending the existing TUMBLE window with an
> > > > optional STAGGER parameter, inspired by the existing DataStream
> > > > WindowStagger, which shifts window boundaries.
> > > >
> > > > However, I agree with your analysis that doing so in SQL would
> > > > silently break the deterministic alignment contract of TUMBLE.
> > > >
> > > > Therefore, I would like to withdraw this part of the proposal.
> > > > 2. Hints and PTF — Deferred for Now
> > > >
> > > >    - Regarding Hints
> > > >
> > > > I agree that a hint is probably not the right abstraction here.
> > > Staggering
> > > > changes the resulting window boundaries, while hints in
> > > >
> > > > Flink are generally treated as plan-intervention mechanisms that do
> > > >
> > > > not alter query semantics.
> > > >
> > > > In addition, there is currently no precedent for window-related hints
> > > >
> > > > in Flink SQL.
> > > >
> > > >
> > > >    - Regarding PTF (Process Table Functions)
> > > >
> > > > I agree that PTF could ultimately become a powerful extension point
> > > >
> > > > for custom or user-defined windows.
> > > >
> > > > However, building a comprehensive PTF-based windowing framework is
> > > >
> > > > itself a substantial design effort and likely deserves a dedicated
> > > >
> > > > discussion.
> > > >
> > > > To keep the scope of this FLIP manageable, I would prefer to leave
> > > >
> > > > PTF integration as future work for now.
> > > >
> > > > ------------------------------
> > > > 3. Revised Proposal — Introduce a New TVF:STAGGER_TUMBLE
> > > >
> > > > Since staggering fundamentally changes the window definition, I now
> > > > believe it should be treated as a logical semantic change rather than
> > > > a pure physical optimization.
> > > >
> > > > Therefore, instead of modifying TUMBLE, the cleaner approach would
> > > > be to introduce a separate TVF with an explicit contract:
> > > >
> > > > STAGGER_TUMBLE(
> > > >     TABLE data,
> > > >     DESCRIPTOR(timecol),
> > > >     size,
> > > >     stagger_strategy
> > > > )
> > > >
> > > > -- stagger_strategy:
> > > > --   'RANDOM'
> > > > --   'NATURAL'
> > > > --   'KEY_BASED'
> > > >
> > > > For KEY_BASED, the requirement of a keyed context (for example,
> > > > Window Aggregation with GROUP BY) would be validated at compile
> > > > time.
> > > >
> > > > Key properties of this approach:
> > > >
> > > >    -
> > > >
> > > >    *Zero impact on TUMBLE*
> > > >
> > > >    The semantic contract of the existing TUMBLE TVF remains fully
> > > >    preserved.
> > > >    -
> > > >
> > > >    *Explicit semantics*
> > > >
> > > >    STAGGER_TUMBLE would define its own semantics explicitly,
> > > >    including that window boundaries may vary depending on the
> selected
> > > >    stagger strategy.
> > > >
> > > > ------------------------------
> > > > 4. Future Work
> > > >
> > > > A potentially cleaner long-term direction may be to separate:
> > > >
> > > >    -
> > > >
> > > >    logical window boundary assignment, and
> > > >    -
> > > >
> > > >    physical emission scheduling
> > > >
> > > > In other words, preserving perfectly aligned window boundaries while
> > > > staggering only the emission timing.
> > > >
> > > > That would constitute a true physical optimization without changing
> > > > query results.
> > > >
> > > > This could potentially evolve into an optional parameter such as
> > > > shift_window_boundary in STAGGER_TUMBLE, and can be explored in a
> > > > follow-up FLIP.
> > > > ------------------------------
> > > >
> > > > Does this revised direction address your core concerns?
> > > >
> > > > I would also greatly appreciate feedback from others on the mailing
> > > > list.
> > > >
> > > > If there is general consensus around this direction, I will update
> > > > the FLIP document accordingly. Otherwise, I am happy to continue
> > > > iterating on the design.
> > > >
> > > > Best regards,
> > > >
> > > > Zihao
> > > >
> > > > Martijn Visser <[email protected]> 于2026年5月21日周四 01:05写道：
> > > >
> > > > > Hi Zihao,
> > > > >
> > > > > Thanks for the FLIP. I am worried that the proposal is mixing
> physical
> > > > > concerns (the downstream bursts of data) into logical semantics. I
> > > > > think a more natural escape hatch are hints. I also think that
> > > > > KEY_BASED is not really a physical optimization anyway, since it
> > > > > shifts window_start / window_end values in the output and therefore
> > > > > changes the result set. That makes it a poor fit for both a TVF
> > > > > argument and a hint, and probably a better fit for a PTF where the
> > > > > user explicitly owns the boundary assignment function.
> > > > >
> > > > > Looking forward to your thoughts.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > Op wo 20 mei 2026 om 14:32 schreef rocxing <[email protected]>:
> > > > > >
> > > > > > Hi Zihao and all,
> > > > > >
> > > > > >
> > > > > > Thanks a lot for this practical proposal.
> > > > > > This is a valuable feature for Flink SQL users, and we have also
> > > > > encountered exactly the same pain points in our production
> > > environments.
> > > > > > Furthermore, the KEY_BASED deterministic stagger strategy is a
> good
> > > way
> > > > > to eliminate non-determinism problems.
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > Pengxiang Wang
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-XXX: Support Window Stagger in FlinkSQL and introduce KEY_BASED deterministic stagger

Reply via email to