Hi, Junrui Thanks for your question, Source is able to send event-time watermark. It can be sent via SourceReaderContext as mentioned in FLIP-467. Example is as follows: > sourceReaderContext.emitWatermark(EventTimeExtension.EVENT_TIME_WATERMARK_DECLARATION.newWatermark(eventTime))
We will add it to the JavaDoc or FLIP to clarify this, thanks for the question! Best, Xu Huang Junrui Lee <jrlee....@gmail.com> 于2025年1月6日周一 11:01写道: > Hi, Xu > > Thanks for your work, I noticed that in this FLIP, event-time watermark is > created and sent through a separate WatermarkGenerator. I would like to > know if there is support for Source to send event-time watermark? > > Best, > Junrui > > Xu Huang <huangxu.wal...@gmail.com> 于2025年1月6日周一 10:31写道: > > > Hi, Anil > > > > Maybe the watermark alignment mechanism on DataStream V1 can solve your > > problem, it can align the watermark of SourceSplit. > > please refer to the documentation [1][2]. > > And we don't provide this feature on this FLIP, this feature will be in > our > > future planning. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits > > [2] > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment > > > > Best, > > Xu Huang > > > > Anil Dasari <dasaria...@myyahoo.com.invalid> 于2025年1月3日周五 22:41写道: > > > > > > > > Hi Xu,Thanks for the response.I am currently using Spark Streaming to > > > process data from Kafka in microbatches, writing each microbatch's data > > to > > > a dedicated prefix in S3. Since Spark Streaming is lazy, it processes > > data > > > only when a microbatch is created or triggered, leaving resources idle > > > until then. This approach is not true streaming. To improve resource > > > utilization and process data as it arrives, I am considering switching > to > > > Flink. > > > Both Spark's Kafka source and Flink's Kafka source (with parallelism > > 1) > > > use multithreaded processing. However, Spark's Kafka source readers > > share a > > > global microbatch epoch, ensuring a consistent view across readers. In > > > contrast, Flink's Kafka source split readers do not share a global > epoch > > or > > > identifiers to divide the stream into chunks. > > > It requires all split readers of a source should emit a special event > > > which same epoch at the same time. > > > Thanks > > > > > > On Friday, January 3, 2025 at 06:21:34 AM PST, Xu Huang < > > > huangxu.wal...@gmail.com> wrote: > > > > > > Hi, Anil > > > > > > I don't understand what you mean by Global Watermark, are you trying to > > > have all Sources emit a special event with the same epoch at the same > > time? > > > Is there a specific user case for this question? > > > > > > Happy new year! > > > > > > Best, > > > Xu Huang > > > > > > Anil Dasari <dasaria...@myyahoo.com.invalid> 于2025年1月3日周五 14:02写道: > > > > > > > Hello XU,Happy new year. Thank you for FLIP-499 and FLIP-467. > > > > I tried to split/chunk streams based by fixed timestamp intervals and > > > > route them to the appropriate destination. A few months ago, I > > evaluated > > > > the following options and found that Flink currently lacks direct > > support > > > > for a global watermark or timer that can share consistent information > > > (such > > > > as an epoch or identifier) across task nodes. > > > > 1. Windowing: While promising, this approach requires record-level > > checks > > > > for flushing, as window data isn't accessible throughout the > pipeline. > > > > 2. Window + Trigger: This method buffers events until the trigger > > > interval > > > > is reached, impacting real-time processing since events are processed > > > only > > > > when the trigger fires. > > > > 3. Processing Time: Processing time is localized to each file writer, > > > > causing inconsistencies across task managers. > > > > 4. Watermark: Watermarks are specific to each source task. > > Additionally, > > > > the initial watermark (before the first event) is not epoch-based, > > > leading > > > > to further challenges. > > > > Would global watermarks address this use case? If not, could this use > > > case > > > > align with any of the proposed FLIPs > > > > Thanks in advance. > > > > > > > > On Thursday, January 2, 2025 at 09:06:31 PM PST, Xu Huang < > > > > huangxu.wal...@gmail.com> wrote: > > > > > > > > Hi Devs, > > > > > > > > Weijie Guo and I would like to initiate a discussion about FLIP-499: > > > > Support Event Time by Generalized Watermark in DataStream V2 > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-499%3A+Support+Event+Time+by+Generalized+Watermark+in+DataStream+V2 > > > > > > > > > [1]. > > > > > > > > Event time is a fundamental feature of Flink that has been widely > > > adopted. > > > > For instance, the Window operator can determine whether to trigger a > > > window > > > > based on event time, and users can register timer using the event > time. > > > > FLIP-467[2] introduces the Generalized Watermark in DataStream V2, > > > enabling > > > > users to define specific events that can be emitted from a source or > > > other > > > > operators, propagate along the streams, received by downstream > > operators, > > > > and aligned during propagation. Within this framework, the > traditional > > > > (event-time) Watermark can be viewed as a special instance of the > > > > Generalized Watermark already provided by Flink. > > > > > > > > To make it easy for users to use event time in DataStream V2, this > FLIP > > > > will implement event time extension in DataStream V2 based on > > Generalized > > > > Watermark. > > > > > > > > For more details, please refer to FLIP-499 [1]. We look forward to > your > > > > feedback. > > > > > > > > Best, > > > > > > > > Xu Huang > > > > > > > > [1] https://cwiki.apache.org/confluence/x/pQz0Ew > > > > > > > > [2] https://cwiki.apache.org/confluence/x/oA6TEg > > > > > > > > > >