I'd propose to downgrade "Refactor the API modules" to TBD. The original proposal was based on the condition that we are allowed to introduce in-place API breaking changes in release 2.0. As the migration period is introduced, and we are no longer planning to do in-place changes / removal for DataStream (and same for APIs in `flink-core`), we need to re-evaluate whether it's feasible to do things like moving classes to different module / packages, turning concrete classes into interfaces on the API classes.
Best, Xintong On Mon, Jul 17, 2023 at 1:10 AM Yun Tang <myas...@live.com> wrote: > I agree that we could downgrade "Eager state declaration" to a > nice-to-have feature. > > For the depreciation of "queryable state", can we just rename to deprecate > "current implementation of queryable state"? The feature to query the > internal state is actually very useful for debugging and could provide more > possibility to extend FlinkSQL more like a database. > > Just as Yuan replied in the previous email [1], current implementation of > queryable state has many problems in design. However, I don't want to make > users feel that this feature cannot be done well, and maybe we can redesign > this feature. As far as I know, risingwave already support queryable state > with better user experience [2]. > > > [1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m > [2] https://syntaxbug.com/06a3e7c554/ > > Best > Yun Tang > ________________________________ > From: Xintong Song <tonysong...@gmail.com> > Sent: Friday, July 14, 2023 13:51 > To: dev@flink.apache.org <dev@flink.apache.org> > Subject: Re: [VOTE] Release 2.0 must-have work items > > Thanks for the support, Yu. > > We will have the guideline before removing DataSet. We are currently > prioritizing works that need to be done before the 1.18 feature freeze, and > will soon get back to working on the guidelines. We expect to get the > guideline ready before or soon after the 1.18 release, which will > definitely be before removing DataSet in 2.0. > > Best, > > Xintong > > > > On Fri, Jul 14, 2023 at 1:06 PM Yu Li <car...@gmail.com> wrote: > > > It's great to see the discussion about what we need to improve on > > (completely) switching from DataSet API to DataStream API from the user > > perspective. I feel that these improvements would happen faster (only) > when > > we seriously prepare to remove the DataSet APIs with a target release, > just > > like what we are doing now. And the same applies to the SinkV1 related > > discussions (smile). > > > > I support Xintong's opinion on keeping "Remove the DataSet APIs" a > > must-have item, meantime I support Yuxia's opinion that we should > > explicitly let our users know how to migrate their existing DataSet API > > based applications afterwards, meaning that the guideline Xintong > mentioned > > is a must-have (rather than best efforts) before removing the DataSet > APIs. > > > > Best Regards, > > Yu > > > > > > On Wed, 12 Jul 2023 at 14:00, yuxia <luoyu...@alumni.sjtu.edu.cn> wrote: > > > > > Thanks Xintong for clarification. A guideline to help users migrating > > from > > > DataSet to DataStream will definitely be helpful. > > > > > > Best regards, > > > Yuxia > > > > > > ----- 原始邮件 ----- > > > 发件人: "Xintong Song" <tonysong...@gmail.com> > > > 收件人: "dev" <dev@flink.apache.org> > > > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12 > > > 主题: Re: [VOTE] Release 2.0 must-have work items > > > > > > @Yuxia, > > > > > > We are aware of the issue that you mentioned. Actually, I don't think > the > > > DataStream API can cover everything in the DataSet API in exactly the > > same > > > way, because the fundamental model, concepts and primitives of the two > > sets > > > of APIs are completely different. Many of the DataSet APIs, especially > > > those accessing the full data set at once, do not fit in the DataStream > > > concepts at all. I think what's important is that users can achieve the > > > same function, even if they may need to code in a different way. > > > > > > We have gone through all the existing DataSet APIs, and categorized > them > > > into 3 kinds: > > > - APIs that are well supported by DataStream API as is. E.g., map, > reduce > > > on grouped dataset, etc. > > > - APIs that can be achieved by DataStream API as is, but with a price > > > (programming complexity, or computation efficiency). E.g., reduce on > full > > > dataset, sort partition, etc. Admittedly, there is room for improvement > > on > > > these. We may keep improving these for the DataStream API, or we can > > > concentrate on supporting them better in the new ProcessFunction API. > > > Either way, I don't think we should block the retiring of DataSet API > on > > > them. > > > - There are also a few APIs that cannot be supported by the DataStream > > API > > > as is, unless users write their custom operators from the ground up. > Only > > > left/rightOuterJoin and combineGroup fall into this category. I think > > > combinedGroup is probably not a problem, because this is more like a > > > variant of reduceGroup that allows the framework to execute more > > > efficiently. As for the outer joins, depending on how badly this is > > needed, > > > it can be supported by emitting the non-joined entries upon triggering > a > > > window join. > > > > > > We are also planning to draft a guideline to help users migrating from > > > DataSet to DataStream, which should demonstrate how users can achieve > > > things like sort-partition with DataStream API. > > > > > > Last but not least, I'd like to point out that the decision to > deprecate > > > and eventually remove the DataSet API was approved in FLIP-131, and all > > the > > > prerequisites mentioned in the FLIP have been completed. > > > > > > Best, > > > > > > Xintong > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > > > > > > > > > > > > On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li <jingsongl...@gmail.com> > > > wrote: > > > > > > > +1 to Leonard and Galen and Jing. > > > > > > > > About Source and Sink. > > > > We're still missing quite a bit of work, including functionality, > > > > including ease of use, including bug fixes, and I'm not sure we'll be > > > > completely done by 2.0. > > > > Until that's done, we won't be in a position to clean up the old > APIs. > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Wed, Jul 12, 2023 at 9:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > wrote: > > > > > > > > > > Hi,Xintong. > > > > > Sorry to disturb the voting. I just found an email[1] about DataSet > > API > > > > from flink-user-zh channel. And I think it's not just a single case > > > > according to my observation. > > > > > > > > > > Remove DataSet is a must have item in release-2.0. But as the user > > > email > > > > said, if we remove DataSet, how users can implement Sort/PartitionBy, > > etc > > > > as they did with DataSet? > > > > > Do we will also provide similar api in datastream or some other > thing > > > > before we remove DataSet? > > > > > Btw, as far as I see, with regarding to replcaing DataSet with > > > > Datastream, Datastream are missing many API. I think it may well take > > > much > > > > effort to fully cover the missing api. > > > > > > > > > > [1] > https://lists.apache.org/thread/syjmt8f74gh8ok3z4lhgt95zl4dzn168 > > > > > > > > > > Best regards, > > > > > Yuxia > > > > > > > > > > ----- 原始邮件 ----- > > > > > 发件人: "Jing Ge" <j...@ververica.com.INVALID> > > > > > 收件人: "dev" <dev@flink.apache.org> > > > > > 发送时间: 星期三, 2023年 7 月 12日 上午 1:23:40 > > > > > 主题: Re: [VOTE] Release 2.0 must-have work items > > > > > > > > > > agree with what Leonard said. There are actually more issues wrt > the > > > new > > > > > Source and SinkV2[1] > > > > > > > > > > Speaking of must-have vs nice-to-have, I think it depends on the > > > > priority. > > > > > If removing them has higher priority, we should keep related tasks > as > > > > > must-have and make sure enough effort will be put to solve those > > issues > > > > and > > > > > therefore be able to remove those APIs. > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > [1] > https://lists.apache.org/thread/90qc9nrlzf0vbvg92klzp9ftxxc43nbk > > > > > > > > > > On Tue, Jul 11, 2023 at 10:26 AM Leonard Xu <xbjt...@gmail.com> > > wrote: > > > > > > > > > > > Thanks Xintong for driving this great work! But I’ve to give my > > > > > > -1(binding) here: > > > > > > > > > > > > -1 to mark "deprecat SourceFunction/SinkFunction/Sinkv1" item as > > must > > > > to > > > > > > have for release 2.0. > > > > > > > > > > > > I do a lot of connector work in the community, and I have two > > > insights > > > > > > from past experience: > > > > > > > > > > > > 1. Many developers reported that it is very difficult to migrate > > from > > > > > > SourceFunction to new Source [1]. The migration of existing > > > conenctors > > > > > > after deprecated SourceFunction is very difficult. Some > developers > > > > (Flavio > > > > > > Pompermaier) reported that they gave up the migration because it > > was > > > > too > > > > > > complicated. I believe it's not a few cases. This means that > > > > deprecating > > > > > > SourceFunction related interfaces require community contributors > to > > > > reduce > > > > > > the migration cost before starting the migration work. > > > > > > > > > > > > 2. IIRC, the function of SinkV2 cannot currently cover > SinkFunction > > > as > > > > > > described in FLIP-287[2], it means the migration path after > > deprecate > > > > > > SinkFunction/Sinkv1 does not exist, thus we cannot mark the > related > > > > > > interfaces of sinkfunction/sinkv1 as deprecated in 1.18. > > > > > > > > > > > > Based on these two cognitions, I think we should not mark these > > > > interfaces > > > > > > as must to have in 2.0. Maintaining the two sets of source/sink > > > > interfaces > > > > > > is not a concern for me, users can choose the interface to > > implement > > > > > > according to their energy and needs. > > > > > > > > > > > > Btw, some work items in 2.0 are marked as must to have, but no > > > > contributor > > > > > > has claimed them yet. I think this is a risk and hope the Release > > > > Managers > > > > > > could pay attention to it. > > > > > > > > > > > > Thank you all RMs for your work, sorry again for interrupting the > > > vote > > > > > > > > > > > > Best, > > > > > > Leonard > > > > > > > > > > > > [1] > > https://lists.apache.org/thread/sqq26s9rorynr4vx4nhxz3fmmxpgtdqp > > > > > > [2] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853 > > > > > > > > > > > > > On Jul 11, 2023, at 4:11 PM, Yuan Mei <yuanmei.w...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > As a second thought, I think "Eager State Declaration" is > > probably > > > > not a > > > > > > > must-have. > > > > > > > > > > > > > > I was originally thinking it is a prerequisite for "state > > querying > > > > for > > > > > > > disaggregated state management". > > > > > > > > > > > > > > Since disaggregated state management itself is not a must-have, > > > > "Eager > > > > > > > State Declaration" is not as well. We can downgrade it to "nice > > to > > > > have" > > > > > > if > > > > > > > no objection. > > > > > > > > > > > > > > Best > > > > > > > > > > > > > > Yuan > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 7:02 PM Jing Ge > > <j...@ververica.com.invalid > > > > > > > > > > wrote: > > > > > > > > > > > > > >> +1 > > > > > > >> > > > > > > >> On Mon, Jul 10, 2023 at 12:52 PM Yu Li <car...@gmail.com> > > wrote: > > > > > > >> > > > > > > >>> +1 (binding) > > > > > > >>> > > > > > > >>> Thanks for driving this and great to see us moving forward. > > > > > > >>> > > > > > > >>> Best Regards, > > > > > > >>> Yu > > > > > > >>> > > > > > > >>> > > > > > > >>> On Mon, 10 Jul 2023 at 11:59, Feng Wang < > wangfeng...@gmail.com > > > > > > > wrote: > > > > > > >>> > > > > > > >>>> +1 > > > > > > >>>> Thanks for driving this, looking forward to the next stage > of > > > > flink. > > > > > > >>>> > > > > > > >>>> On Fri, Jul 7, 2023 at 5:31 PM Xintong Song < > > > > tonysong...@gmail.com> > > > > > > >>> wrote: > > > > > > >>>> > > > > > > >>>>> Hi all, > > > > > > >>>>> > > > > > > >>>>> I'd like to start the VOTE for the must-have work items for > > > > release > > > > > > >> 2.0 > > > > > > >>>>> [1]. The corresponding discussion thread is [2]. > > > > > > >>>>> > > > > > > >>>>> Please note that once the vote is approved, any changes to > > the > > > > > > >>> must-have > > > > > > >>>>> items (adding / removing must-have items, changing the > > > priority) > > > > > > >>> requires > > > > > > >>>>> another vote. Assigning contributors / reviewers, updating > > > > > > >>> descriptions / > > > > > > >>>>> progress, changes to nice-to-have items do not require > > another > > > > vote. > > > > > > >>>>> > > > > > > >>>>> The vote will be open until at least July 12, following the > > > > consensus > > > > > > >>>>> voting process. Votes of PMC members are binding. > > > > > > >>>>> > > > > > > >>>>> Best, > > > > > > >>>>> > > > > > > >>>>> Xintong > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> [1] > > > > https://cwiki.apache.org/confluence/display/FLINK/2.0+Release > > > > > > >>>>> > > > > > > >>>>> [2] > > > > https://lists.apache.org/thread/l3dkdypyrovd3txzodn07lgdwtwvhgk4 > > > > > > >>>>> > > > > > > >>>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > >