Hi, @Hongshun

Thank you for your participation in the discussion.

> I wonder whether to replace the older event-time watermark strategy in
this FLIP or a later job?

The generic watermark we are introducing will support event time watermark
on DataStream V2 in the next job, which I am currently working on.
We will provide some basic features about event time such as Idleeness,
BoundedOutOfOrderness and more features will be implemented after 2.0. If
you are interested, feel free to discuss the details with us.

> You also add a new interface `Watermark`, howerver,
org.apache.flink.api.common.eventtime.Watermark has already existed, maybe
it will be confused to distinguish?

As for the naming, since we want DataStream V2 to have only Generalized
Watermark, and other watermarks are just one kind of Generalized Watermark,
so we use “Watermark”. Additionally, please note that the package names and
scopes (DataStream V1 or V2) are different between original event time
watermark and Generalized Watermark, which should help clarify potential
confusion.

Best,
Xu Huang

Hongshun Wang <loserwang1...@gmail.com> 于2024年12月29日周日 16:08写道:

> Hi Xu Hang,
> Sorry, it's late to join this. Thanks for your job, it make sense to me.  I
> have several questions:
>
> > With this abstraction, the original event-time watermark can be seen as a
> built-in use case of it.
>
> I wonder whether to replace the older event-time watermark strategy in this
> FLIP or a later job? In source operator's watermark alignment, it will
> suspend the reading of a source reader by
> org.apache.flink.api.connector.source.SourceReader#pauseOrResumeSplits, I
> don't see this content in this FLIP. I do hope to see the details to
> replace WatermarkStrategy's `withIdleness`, `withWatermarkAlignment`,
> `forBoundedOutOfOrderness`, etc.
>
>
> You also add a new interface `Watermark`, howerver,
> org.apache.flink.api.common.eventtime.Watermark has already existed, maybe
> it will be confused to distinguish?
>
> Best
> Hongshun
>
> On Thu, Dec 19, 2024 at 3:22 PM Xu Huang <huangxu.wal...@gmail.com> wrote:
>
> > Hi Yunfeng, The watermark identifiers are case-sensitive, and we cannot
> > distinguish which connectors are internal. Therefore, the naming strategy
> > could be "CONNECTOR_KAFKA_IDLE." I have added the naming strategy to
> FLIP,
> > PTAL. Thanks again! Best, Xu Huang
> >
> > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月19日周四 14:30写道:
> >
> > > Hi Xu Huang,
> > >
> > > “INTERNAL_RUNTIME_BACKLOG” sounds good to me. But I’m not sure whether
> it
> > > is proper to treat connector watermarks as internal, or which
> connectors
> > > should be regarded as internal. I’m okay with it to add the naming
> > strategy
> > > to the FLIP now before we reached an agreement on the details here, to
> > > hopefully involve more discussions from other people.
> > >
> > > Besides, your suggestion also reminds me that the FLIP might need to
> > > clarify whether the identifiers are case-sensitive.
> > >
> > > Best,
> > > Yunfeng
> > >
> > > > 2024年12月19日 11:54,Xu Huang <huangxu.wal...@gmail.com> 写道:
> > > >
> > > > Hi,@Yunfeng
> > > >> add to the FLIP a proposal to the naming convention of identifiers
> > > >
> > > > Thank you for your suggestion!  It's a great solution to remind users
> > > that
> > > > the identifier should be unique.
> > > > In my opinion, the Flink internal identifiers could be named as
> > > > "INTERNAL_MODULE_XXX," such as "INTERNAL_RUNTIME_BACKLOG" or
> > > > "INTERNAL_CONNECTOR_KAFKA_IDLE."
> > > >
> > > > What do you think? I will incorporate this naming strategy into the
> > FLIP
> > > if
> > > > you're okay with it.
> > > >
> > > > Thanks for your suggestion again!
> > > >
> > > > Best,
> > > > Xu Huang
> > > >
> > > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月18日周三 19:10写道:
> > > >
> > > >> Hi Xu Huang,
> > > >>
> > > >> I noticed your discussions mentioned that the identifier of
> watermarks
> > > >> needs to be global across the whole job. Therefore, would it be
> better
> > > to
> > > >> add to the FLIP a proposal to the naming convention of identifiers?
> > For
> > > >> example, the identifiers are encouraged to be named with its holding
> > > >> module/connector as a prefix, like "runtime-backlog”. It is not a
> > > >> compulsory requirement, and is more like a suggestion.
> > > >>
> > > >> Best,
> > > >> Yufneng
> > > >>
> > > >>> 2024年12月17日 11:13,Xu Huang <huangxu.wal...@gmail.com> 写道:
> > > >>>
> > > >>> Thank you for participating in the discussion.
> > > >>>
> > > >>> @jrlee....@gmail.com <jrlee....@gmail.com>
> > > >>>> If I have a large number of generalized watermarks that need to be
> > > >>> created, where should they be declared?
> > > >>>
> > > >>> In the current design, the generalized watermark should be declared
> > > only
> > > >>> once in the first user-defined function or source reader that needs
> > to
> > > >>> create and send it downstream.
> > > >>> Furthermore, all watermark identifiers must remain unique within a
> > > single
> > > >>> application. Once the upstream operator declares a generalized
> > > watermark,
> > > >>> the downstream operator will be aware of it and will be able to
> > process
> > > >> it
> > > >>> accordingly.
> > > >>>
> > > >>> Best, Xu Huang
> > > >>>
> > > >>> Junrui Lee <jrlee....@gmail.com> 于2024年12月16日周一 19:40写道:
> > > >>>
> > > >>>> Hi Xu Huang,
> > > >>>>
> > > >>>> Thanks for the proposal!
> > > >>>>
> > > >>>> I have a question: If I have a large number of generalized
> > watermarks
> > > >> that
> > > >>>> need to be created, where should they be declared? Should they be
> > > >> declared
> > > >>>> only once in a single Source, or in all operators that need to
> send,
> > > >>>> receive, and process them?
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Junrui
> > > >>>>
> > > >>>> Xu Huang <huangxu.wal...@gmail.com> 于2024年12月10日周二 15:21写道:
> > > >>>>
> > > >>>>> Hi Devs,
> > > >>>>>
> > > >>>>> Jeyhun Karimov, Weijie Guo and I would like to initiate a
> > discussion
> > > >>>> about
> > > >>>>> FLIP-467: Introduce Generalized Watermarks [1].
> > > >>>>>
> > > >>>>> Based on our findings, we recognize the need for specific events
> > that
> > > >>>>> require propagation and alignment across streams, functioning
> > > similarly
> > > >>>> to
> > > >>>>> watermarks. An example of this is the IsProcessingBacklog event
> > > >> proposed
> > > >>>> in
> > > >>>>> FLIP-309 [2].
> > > >>>>>
> > > >>>>>
> > > >>>>> This has inspired us to create a more generalized watermark
> > framework
> > > >>>> that
> > > >>>>> transcends traditional event time semantics. The generalized
> > > watermark
> > > >>>>> framework allows users to define a variety of events that can be
> > > >> emitted
> > > >>>>> from the source or other operators, propagate through the
> streams,
> > > and
> > > >> be
> > > >>>>> received by downstream operators with aligned properties. With
> this
> > > >>>>> abstraction, users and developers can design specialized events
> > > >> according
> > > >>>>> to their needs, such as EventTime watermark or idleness watermark
> > > >> status.
> > > >>>>>
> > > >>>>>
> > > >>>>> Note that this feature only worked for DataStream V2.
> > > >>>>>
> > > >>>>> For more details, please refer to FLIP-467 [1]. We look forward
> to
> > > your
> > > >>>>> feedback.
> > > >>>>>
> > > >>>>>
> > > >>>>> Best,
> > > >>>>>
> > > >>>>> Jeyhun Karimov, Weijie Guo and Xu Huang
> > > >>>>>
> > > >>>>> [1]
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-467%3A+Introduce+Generalized+Watermarks
> > > >>>>>
> > > >>>>> [2]
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Reply via email to