Hi Xu,Thank you.
Regards. Anil    On Sunday, January 5, 2025 at 06:31:15 PM PST, Xu Huang 
<huangxu.wal...@gmail.com> wrote:  
 
 Hi, Anil

Maybe the watermark alignment mechanism on DataStream V1 can solve your
problem, it can align the watermark of SourceSplit.
please refer to the documentation [1][2].
And we don't provide this feature on this FLIP, this feature will be in our
future planning.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment

Best,
Xu Huang

Anil Dasari <dasaria...@myyahoo.com.invalid> 于2025年1月3日周五 22:41写道:

>
> Hi Xu,Thanks for the response.I am currently using Spark Streaming to
> process data from Kafka in microbatches, writing each microbatch's data to
> a dedicated prefix in S3. Since Spark Streaming is lazy, it processes data
> only when a microbatch is created or triggered, leaving resources idle
> until then. This approach is not true streaming. To improve resource
> utilization and process data as it arrives, I am considering switching to
> Flink.
> Both Spark's Kafka source and Flink's Kafka source (with parallelism > 1)
> use multithreaded processing. However, Spark's Kafka source readers share a
> global microbatch epoch, ensuring a consistent view across readers. In
> contrast, Flink's Kafka source split readers do not share a global epoch or
> identifiers to divide the stream into chunks.
> It requires all split readers of a source should emit a special event
> which same epoch at the same time.
> Thanks
>
>    On Friday, January 3, 2025 at 06:21:34 AM PST, Xu Huang <
> huangxu.wal...@gmail.com> wrote:
>
>  Hi, Anil
>
> I don't understand what you mean by Global Watermark, are you trying to
> have all Sources emit a special event with the same epoch at the same time?
> Is there a specific user case for this question?
>
> Happy new year!
>
> Best,
> Xu Huang
>
> Anil Dasari <dasaria...@myyahoo.com.invalid> 于2025年1月3日周五 14:02写道:
>
> >  Hello XU,Happy new year. Thank you for FLIP-499 and FLIP-467.
> > I tried to split/chunk streams based by fixed timestamp intervals and
> > route them to the appropriate destination. A few months ago, I evaluated
> > the following options and found that Flink currently lacks direct support
> > for a global watermark or timer that can share consistent information
> (such
> > as an epoch or identifier) across task nodes.
> > 1. Windowing: While promising, this approach requires record-level checks
> > for flushing, as window data isn't accessible throughout the pipeline.
> > 2. Window + Trigger: This method buffers events until the trigger
> interval
> > is reached, impacting real-time processing since events are processed
> only
> > when the trigger fires.
> > 3. Processing Time: Processing time is localized to each file writer,
> > causing inconsistencies across task managers.
> > 4. Watermark: Watermarks are specific to each source task. Additionally,
> > the initial watermark (before the first event) is not epoch-based,
> leading
> > to further challenges.
> > Would global watermarks address this use case? If not, could this use
> case
> > align with any of the proposed FLIPs
> > Thanks in advance.
> >
> >    On Thursday, January 2, 2025 at 09:06:31 PM PST, Xu Huang <
> > huangxu.wal...@gmail.com> wrote:
> >
> >  Hi Devs,
> >
> > Weijie Guo and I would like to initiate a discussion about FLIP-499:
> > Support Event Time by Generalized Watermark in DataStream V2
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-499%3A+Support+Event+Time+by+Generalized+Watermark+in+DataStream+V2
> > >
> > [1].
> >
> > Event time is a fundamental feature of Flink that has been widely
> adopted.
> > For instance, the Window operator can determine whether to trigger a
> window
> > based on event time, and users can register timer using the event time.
> > FLIP-467[2] introduces the Generalized Watermark in DataStream V2,
> enabling
> > users to define specific events that can be emitted from a source or
> other
> > operators, propagate along the streams, received by downstream operators,
> > and aligned during propagation. Within this framework, the traditional
> > (event-time) Watermark can be viewed as a special instance of the
> > Generalized Watermark already provided by Flink.
> >
> > To make it easy for users to use event time in DataStream V2, this FLIP
> > will implement event time extension in DataStream V2 based on Generalized
> > Watermark.
> >
> > For more details, please refer to FLIP-499 [1]. We look forward to your
> > feedback.
> >
> > Best,
> >
> > Xu Huang
> >
> > [1] https://cwiki.apache.org/confluence/x/pQz0Ew
> >
> > [2] https://cwiki.apache.org/confluence/x/oA6TEg
> >
>
  

Reply via email to