Hi Xu Hang,
Sorry, it's late to join this. Thanks for your job, it make sense to me.  I
have several questions:

> With this abstraction, the original event-time watermark can be seen as a
built-in use case of it.

I wonder whether to replace the older event-time watermark strategy in this
FLIP or a later job? In source operator's watermark alignment, it will
suspend the reading of a source reader by
org.apache.flink.api.connector.source.SourceReader#pauseOrResumeSplits, I
don't see this content in this FLIP. I do hope to see the details to
replace WatermarkStrategy's `withIdleness`, `withWatermarkAlignment`,
`forBoundedOutOfOrderness`, etc.


You also add a new interface `Watermark`, howerver,
org.apache.flink.api.common.eventtime.Watermark has already existed, maybe
it will be confused to distinguish?

Best
Hongshun

On Thu, Dec 19, 2024 at 3:22 PM Xu Huang <huangxu.wal...@gmail.com> wrote:

> Hi Yunfeng, The watermark identifiers are case-sensitive, and we cannot
> distinguish which connectors are internal. Therefore, the naming strategy
> could be "CONNECTOR_KAFKA_IDLE." I have added the naming strategy to FLIP,
> PTAL. Thanks again! Best, Xu Huang
>
> Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月19日周四 14:30写道:
>
> > Hi Xu Huang,
> >
> > “INTERNAL_RUNTIME_BACKLOG” sounds good to me. But I’m not sure whether it
> > is proper to treat connector watermarks as internal, or which connectors
> > should be regarded as internal. I’m okay with it to add the naming
> strategy
> > to the FLIP now before we reached an agreement on the details here, to
> > hopefully involve more discussions from other people.
> >
> > Besides, your suggestion also reminds me that the FLIP might need to
> > clarify whether the identifiers are case-sensitive.
> >
> > Best,
> > Yunfeng
> >
> > > 2024年12月19日 11:54,Xu Huang <huangxu.wal...@gmail.com> 写道:
> > >
> > > Hi,@Yunfeng
> > >> add to the FLIP a proposal to the naming convention of identifiers
> > >
> > > Thank you for your suggestion!  It's a great solution to remind users
> > that
> > > the identifier should be unique.
> > > In my opinion, the Flink internal identifiers could be named as
> > > "INTERNAL_MODULE_XXX," such as "INTERNAL_RUNTIME_BACKLOG" or
> > > "INTERNAL_CONNECTOR_KAFKA_IDLE."
> > >
> > > What do you think? I will incorporate this naming strategy into the
> FLIP
> > if
> > > you're okay with it.
> > >
> > > Thanks for your suggestion again!
> > >
> > > Best,
> > > Xu Huang
> > >
> > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月18日周三 19:10写道:
> > >
> > >> Hi Xu Huang,
> > >>
> > >> I noticed your discussions mentioned that the identifier of watermarks
> > >> needs to be global across the whole job. Therefore, would it be better
> > to
> > >> add to the FLIP a proposal to the naming convention of identifiers?
> For
> > >> example, the identifiers are encouraged to be named with its holding
> > >> module/connector as a prefix, like "runtime-backlog”. It is not a
> > >> compulsory requirement, and is more like a suggestion.
> > >>
> > >> Best,
> > >> Yufneng
> > >>
> > >>> 2024年12月17日 11:13,Xu Huang <huangxu.wal...@gmail.com> 写道:
> > >>>
> > >>> Thank you for participating in the discussion.
> > >>>
> > >>> @jrlee....@gmail.com <jrlee....@gmail.com>
> > >>>> If I have a large number of generalized watermarks that need to be
> > >>> created, where should they be declared?
> > >>>
> > >>> In the current design, the generalized watermark should be declared
> > only
> > >>> once in the first user-defined function or source reader that needs
> to
> > >>> create and send it downstream.
> > >>> Furthermore, all watermark identifiers must remain unique within a
> > single
> > >>> application. Once the upstream operator declares a generalized
> > watermark,
> > >>> the downstream operator will be aware of it and will be able to
> process
> > >> it
> > >>> accordingly.
> > >>>
> > >>> Best, Xu Huang
> > >>>
> > >>> Junrui Lee <jrlee....@gmail.com> 于2024年12月16日周一 19:40写道:
> > >>>
> > >>>> Hi Xu Huang,
> > >>>>
> > >>>> Thanks for the proposal!
> > >>>>
> > >>>> I have a question: If I have a large number of generalized
> watermarks
> > >> that
> > >>>> need to be created, where should they be declared? Should they be
> > >> declared
> > >>>> only once in a single Source, or in all operators that need to send,
> > >>>> receive, and process them?
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Junrui
> > >>>>
> > >>>> Xu Huang <huangxu.wal...@gmail.com> 于2024年12月10日周二 15:21写道:
> > >>>>
> > >>>>> Hi Devs,
> > >>>>>
> > >>>>> Jeyhun Karimov, Weijie Guo and I would like to initiate a
> discussion
> > >>>> about
> > >>>>> FLIP-467: Introduce Generalized Watermarks [1].
> > >>>>>
> > >>>>> Based on our findings, we recognize the need for specific events
> that
> > >>>>> require propagation and alignment across streams, functioning
> > similarly
> > >>>> to
> > >>>>> watermarks. An example of this is the IsProcessingBacklog event
> > >> proposed
> > >>>> in
> > >>>>> FLIP-309 [2].
> > >>>>>
> > >>>>>
> > >>>>> This has inspired us to create a more generalized watermark
> framework
> > >>>> that
> > >>>>> transcends traditional event time semantics. The generalized
> > watermark
> > >>>>> framework allows users to define a variety of events that can be
> > >> emitted
> > >>>>> from the source or other operators, propagate through the streams,
> > and
> > >> be
> > >>>>> received by downstream operators with aligned properties. With this
> > >>>>> abstraction, users and developers can design specialized events
> > >> according
> > >>>>> to their needs, such as EventTime watermark or idleness watermark
> > >> status.
> > >>>>>
> > >>>>>
> > >>>>> Note that this feature only worked for DataStream V2.
> > >>>>>
> > >>>>> For more details, please refer to FLIP-467 [1]. We look forward to
> > your
> > >>>>> feedback.
> > >>>>>
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Jeyhun Karimov, Weijie Guo and Xu Huang
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-467%3A+Introduce+Generalized+Watermarks
> > >>>>>
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to