Hi Xu Hang, Sorry, it's late to join this. Thanks for your job, it make sense to me. I have several questions:
> With this abstraction, the original event-time watermark can be seen as a built-in use case of it. I wonder whether to replace the older event-time watermark strategy in this FLIP or a later job? In source operator's watermark alignment, it will suspend the reading of a source reader by org.apache.flink.api.connector.source.SourceReader#pauseOrResumeSplits, I don't see this content in this FLIP. I do hope to see the details to replace WatermarkStrategy's `withIdleness`, `withWatermarkAlignment`, `forBoundedOutOfOrderness`, etc. You also add a new interface `Watermark`, howerver, org.apache.flink.api.common.eventtime.Watermark has already existed, maybe it will be confused to distinguish? Best Hongshun On Thu, Dec 19, 2024 at 3:22 PM Xu Huang <huangxu.wal...@gmail.com> wrote: > Hi Yunfeng, The watermark identifiers are case-sensitive, and we cannot > distinguish which connectors are internal. Therefore, the naming strategy > could be "CONNECTOR_KAFKA_IDLE." I have added the naming strategy to FLIP, > PTAL. Thanks again! Best, Xu Huang > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月19日周四 14:30写道: > > > Hi Xu Huang, > > > > “INTERNAL_RUNTIME_BACKLOG” sounds good to me. But I’m not sure whether it > > is proper to treat connector watermarks as internal, or which connectors > > should be regarded as internal. I’m okay with it to add the naming > strategy > > to the FLIP now before we reached an agreement on the details here, to > > hopefully involve more discussions from other people. > > > > Besides, your suggestion also reminds me that the FLIP might need to > > clarify whether the identifiers are case-sensitive. > > > > Best, > > Yunfeng > > > > > 2024年12月19日 11:54,Xu Huang <huangxu.wal...@gmail.com> 写道: > > > > > > Hi,@Yunfeng > > >> add to the FLIP a proposal to the naming convention of identifiers > > > > > > Thank you for your suggestion! It's a great solution to remind users > > that > > > the identifier should be unique. > > > In my opinion, the Flink internal identifiers could be named as > > > "INTERNAL_MODULE_XXX," such as "INTERNAL_RUNTIME_BACKLOG" or > > > "INTERNAL_CONNECTOR_KAFKA_IDLE." > > > > > > What do you think? I will incorporate this naming strategy into the > FLIP > > if > > > you're okay with it. > > > > > > Thanks for your suggestion again! > > > > > > Best, > > > Xu Huang > > > > > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2024年12月18日周三 19:10写道: > > > > > >> Hi Xu Huang, > > >> > > >> I noticed your discussions mentioned that the identifier of watermarks > > >> needs to be global across the whole job. Therefore, would it be better > > to > > >> add to the FLIP a proposal to the naming convention of identifiers? > For > > >> example, the identifiers are encouraged to be named with its holding > > >> module/connector as a prefix, like "runtime-backlog”. It is not a > > >> compulsory requirement, and is more like a suggestion. > > >> > > >> Best, > > >> Yufneng > > >> > > >>> 2024年12月17日 11:13,Xu Huang <huangxu.wal...@gmail.com> 写道: > > >>> > > >>> Thank you for participating in the discussion. > > >>> > > >>> @jrlee....@gmail.com <jrlee....@gmail.com> > > >>>> If I have a large number of generalized watermarks that need to be > > >>> created, where should they be declared? > > >>> > > >>> In the current design, the generalized watermark should be declared > > only > > >>> once in the first user-defined function or source reader that needs > to > > >>> create and send it downstream. > > >>> Furthermore, all watermark identifiers must remain unique within a > > single > > >>> application. Once the upstream operator declares a generalized > > watermark, > > >>> the downstream operator will be aware of it and will be able to > process > > >> it > > >>> accordingly. > > >>> > > >>> Best, Xu Huang > > >>> > > >>> Junrui Lee <jrlee....@gmail.com> 于2024年12月16日周一 19:40写道: > > >>> > > >>>> Hi Xu Huang, > > >>>> > > >>>> Thanks for the proposal! > > >>>> > > >>>> I have a question: If I have a large number of generalized > watermarks > > >> that > > >>>> need to be created, where should they be declared? Should they be > > >> declared > > >>>> only once in a single Source, or in all operators that need to send, > > >>>> receive, and process them? > > >>>> > > >>>> Best regards, > > >>>> > > >>>> Junrui > > >>>> > > >>>> Xu Huang <huangxu.wal...@gmail.com> 于2024年12月10日周二 15:21写道: > > >>>> > > >>>>> Hi Devs, > > >>>>> > > >>>>> Jeyhun Karimov, Weijie Guo and I would like to initiate a > discussion > > >>>> about > > >>>>> FLIP-467: Introduce Generalized Watermarks [1]. > > >>>>> > > >>>>> Based on our findings, we recognize the need for specific events > that > > >>>>> require propagation and alignment across streams, functioning > > similarly > > >>>> to > > >>>>> watermarks. An example of this is the IsProcessingBacklog event > > >> proposed > > >>>> in > > >>>>> FLIP-309 [2]. > > >>>>> > > >>>>> > > >>>>> This has inspired us to create a more generalized watermark > framework > > >>>> that > > >>>>> transcends traditional event time semantics. The generalized > > watermark > > >>>>> framework allows users to define a variety of events that can be > > >> emitted > > >>>>> from the source or other operators, propagate through the streams, > > and > > >> be > > >>>>> received by downstream operators with aligned properties. With this > > >>>>> abstraction, users and developers can design specialized events > > >> according > > >>>>> to their needs, such as EventTime watermark or idleness watermark > > >> status. > > >>>>> > > >>>>> > > >>>>> Note that this feature only worked for DataStream V2. > > >>>>> > > >>>>> For more details, please refer to FLIP-467 [1]. We look forward to > > your > > >>>>> feedback. > > >>>>> > > >>>>> > > >>>>> Best, > > >>>>> > > >>>>> Jeyhun Karimov, Weijie Guo and Xu Huang > > >>>>> > > >>>>> [1] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-467%3A+Introduce+Generalized+Watermarks > > >>>>> > > >>>>> [2] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog > > >>>>> > > >>>> > > >> > > >> > > > > >