I'm delighted to see interest in developing support for processing-time temporal joins.
The proposed implementation seems rather complex, and I'm not convinced this complexity is justified/necessary. I'd like to outline a simpler alternative that I think would satisfy the key objectives. Key ideas: 1. Limit support to the HybridSource (or a derivative thereof). (E.g., I'm guessing the MySQL CDC Source could be reworked to be a hybrid source.) 2. Have this HybridSource wait to begin emitting watermarks until it has handled all events from the bounded sources. (I'm not sure how the HybridSource handles this now; if this is an incompatible change, we can find a way to deal with that.) 3. Instruct users to use an ingestion time watermarking strategy for their unbounded source (the source the HybridSource handles last) if they want to do something like a processing time temporal join. One objection to this is the limitation of only supporting the HybridSource -- what about cases where the user has a single source, e.g., a Kafka topic? I'm suggesting the user would divide their build-side stream into two parts -- a bounded component that is fully ingested by the hybrid source before watermarking begins, followed by an unbounded component. I think this alternative handles use cases like processing-time temporal join rather nicely, without requiring any changes to watermarks or the core runtime. David On Thu, Jun 29, 2023 at 1:39 AM Martijn Visser <martijnvis...@apache.org> wrote: > > Hi Dong and Xuannan, > > I'm excited to see this FLIP. I think support for processing-time > temporal joins is something that the Flink users will greatly benefit > off. I specifically want to call-out that it's great to see the use > cases that this enables. From a technical implementation perspective, > I defer to the opinion of others with expertise on this topic. > > Best regards, > > Martijn > > On Sun, Jun 25, 2023 at 9:03 AM Xuannan Su <suxuanna...@gmail.com> wrote: > > > > Hi all, > > > > Dong(cc'ed) and I are opening this thread to discuss our proposal to > > enhance the watermark to properly support processing-time temporal > > join, which has been documented in FLIP-326 [1]. > > > > We want to support the use case where the records from the probe side > > of the processing-time temporal join need to wait until the build side > > finishes the snapshot phrase by enhancing the expressiveness of the > > Watermark. Additionally, these changes lay the groundwork for > > simplifying the DataStream APIs, eliminating the need for users to > > explicitly differentiate between event-time and processing-time, > > resulting in a more intuitive user experience. > > > > Please refer to the FLIP document for more details about the proposed > > design and implementation. We welcome any feedback and opinions on > > this proposal. > > > > Best regards, > > > > Dong and Xuannan > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-326%3A+Enhance+Watermark+to+Support+Processing-Time+Temporal+Join