Hi Weijie, Thanks for driving the work! There are indeed many pain points in the current DataStream API, which are challenging to resolve with its existing design. It is a great opportunity to propose a new DataStream API that tackles these issues. I like the way we've divided the FLIP into multiple sub-FLIPs; the roadmap is clear and comprehensible. +1 for the umbrella FLIP. I am eager to see the sub-FLIPs!
Best regards, Xuannan On Wed, Jan 24, 2024 at 8:55 PM Wencong Liu <liuwencle...@163.com> wrote: > > Hi Weijie, > > > Thank you for the effort you've put into the DataStream API ! By reorganizing > and > redesigning the DataStream API, as well as addressing some of the unreasonable > designs within it, we can enhance the efficiency of job development for > developers. > It also allows developers to design more flexible Flink jobs to meet business > requirements. > > > I have conducted a comprehensive review of the DataStream API design in > versions > 1.18 and 1.19. I found quite a few functional defects in the DataStream API, > such as the > lack of corresponding APIs in batch processing scenarios. In the upcoming > 1.20 version, > I will further improve the DataStream API in batch computing scenarios. > > > The issues existing in the old DataStream API (which can be referred to as > V1) can be > addressed from a design perspective in the initial version of V2. I hope to > also have the > opportunity to participate in the development of DataStream V2 and make my > contribution. > > > Regarding FLIP-408, I have a question: The Processing TimerService is > currently > defined as one of the basic primitives, partly because it's understood that > you have to choose between processing time and event time. > The other part of the reason is that it needs to work based on the task's > mailbox thread model to avoid concurrency issues. Could you clarify the second > part of the reason? > > Best, > Wencong Liu > > > > > > > > > > > > > > > At 2023-12-26 14:42:20, "weijie guo" <guoweijieres...@gmail.com> wrote: > >Hi devs, > > > > > >I'd like to start a discussion about FLIP-408: [Umbrella] Introduce > >DataStream API V2 [1]. > > > > > >The DataStream API is one of the two main APIs that Flink provides for > >writing data processing programs. As an API that was introduced > >practically since day-1 of the project and has been evolved for nearly > >a decade, we are observing more and more problems of it. Improvements > >on these problems require significant breaking changes, which makes > >in-place refactor impractical. Therefore, we propose to introduce a > >new set of APIs, the DataStream API V2, to gradually replace the > >original DataStream API. > > > > > >The proposal to introduce a whole set new API is complex and includes > >massive changes. We are planning to break it down into multiple > >sub-FLIPs for incremental discussion. This FLIP is only used as an > >umbrella, mainly focusing on motivation, goals, and overall planning. > >That is to say, more design and implementation details will be > >discussed in other FLIPs. > > > > > >Given that it's hard to imagine the detailed design of the new API if > >we're just talking about this umbrella FLIP, and we probably won't be > >able to give an opinion on it. Therefore, I have prepared two > >sub-FLIPs [2][3] at the same time, and the discussion of them will be > >posted later in separate threads. > > > > > >Looking forward to hearing from you, thanks! > > > > > >Best regards, > > > >Weijie > > > > > > > >[1] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2 > > > >[2] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction > > > > > >[3] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2