I'm delighted to see interest in developing support for
processing-time temporal joins.

The proposed implementation seems rather complex, and I'm not
convinced this complexity is justified/necessary. I'd like to outline
a simpler alternative that I think would satisfy the key objectives.

Key ideas:

1. Limit support to the HybridSource (or a derivative thereof). (E.g.,
I'm guessing the MySQL CDC Source could be reworked to be a hybrid
source.)
2. Have this HybridSource wait to begin emitting watermarks until it
has handled all events from the bounded sources. (I'm not sure how the
HybridSource handles this now; if this is an incompatible change, we
can find a way to deal with that.)
3. Instruct users to use an ingestion time watermarking strategy for
their unbounded source (the source the HybridSource handles last) if
they want to do something like a processing time temporal join.

One objection to this is the limitation of only supporting the
HybridSource -- what about cases where the user has a single source,
e.g., a Kafka topic? I'm suggesting the user would divide their
build-side stream into two parts -- a bounded component that is fully
ingested by the hybrid source before watermarking begins, followed by
an unbounded component.

I think this alternative handles use cases like processing-time
temporal join rather nicely, without requiring any changes to
watermarks or the core runtime.

David

On Thu, Jun 29, 2023 at 1:39 AM Martijn Visser <martijnvis...@apache.org> wrote:
>
> Hi Dong and Xuannan,
>
> I'm excited to see this FLIP. I think support for processing-time
> temporal joins is something that the Flink users will greatly benefit
> off. I specifically want to call-out that it's great to see the use
> cases that this enables. From a technical implementation perspective,
> I defer to the opinion of others with expertise on this topic.
>
> Best regards,
>
> Martijn
>
> On Sun, Jun 25, 2023 at 9:03 AM Xuannan Su <suxuanna...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Dong(cc'ed) and I are opening this thread to discuss our proposal to
> > enhance the watermark to properly support processing-time temporal
> > join, which has been documented in FLIP-326 [1].
> >
> > We want to support the use case where the records from the probe side
> > of the processing-time temporal join need to wait until the build side
> > finishes the snapshot phrase by enhancing the expressiveness of the
> > Watermark. Additionally, these changes lay the groundwork for
> > simplifying the DataStream APIs, eliminating the need for users to
> > explicitly differentiate between event-time and processing-time,
> > resulting in a more intuitive user experience.
> >
> > Please refer to the FLIP document for more details about the proposed
> > design and implementation. We welcome any feedback and opinions on
> > this proposal.
> >
> > Best regards,
> >
> > Dong and Xuannan
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-326%3A+Enhance+Watermark+to+Support+Processing-Time+Temporal+Join

Reply via email to