Thanks all for the discussion. The wiki has been updated as discussed. I'm starting a vote now.
Best, Xintong On Wed, Jul 5, 2023 at 9:52 AM Xintong Song <tonysong...@gmail.com> wrote: > Hi ConradJam, > > I think Chesnay has already put his name as the Contributor for the two > tasks you listed. Maybe you can reach out to him to see if you can > collaborate on this. > > In general, I don't think contributing to a release 2.0 issue is much > different from contributing to a regular issue. We haven't yet created JIRA > tickets for all the listed tasks because many of them needs further > discussions and / or FLIPs to decide whether and how they should be > performed. > > Best, > > Xintong > > > > On Mon, Jul 3, 2023 at 10:37 PM ConradJam <jam.gz...@gmail.com> wrote: > >> Hi Community: >> I see some tasks in the 2.0 list that haven't been assigned yet. I want >> to take the initiative to take on some tasks that I can complete. How do I >> apply to the community for this part of the task? I am interested in the >> following parts of FLINK-32377 >> <https://issues.apache.org/jira/browse/FLINK-32377>, do I need to create >> issuse myself and point it to myself? >> >> - the current timestamp, which is problematic w.r.t. caching and testing, >> while providing no value. >> - Remove JarRequestBody#programArgs in favor of #programArgsList. >> >> [1] FLINK-32377 <https://issues.apache.org/jira/browse/FLINK-32377> >> https://issues.apache.org/jira/browse/FLINK-32377 >> >> Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道: >> >> >> Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道: >> >> > Thanks Xintong for driving the effort. >> > >> > I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay, >> > especially the types. We have various configs that encode Time / >> MemorySize >> > that are Long instead! >> > >> > Regards, >> > Hong >> > >> > >> > >> > > On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote: >> > > >> > > CAUTION: This email originated from outside of the organization. Do >> not >> > click links or open attachments unless you can confirm the sender and >> know >> > the content is safe. >> > > >> > > >> > > >> > > Thanks for driving this effort, Xintong! >> > > >> > > To Chesnay >> > >> I'm curious as to why the "Disaggregated State Management" item is >> > >> marked as a must-have; will it require changes that break something? >> > >> What prevents it from being added in 2.1? >> > > >> > > As to "Disaggregated State Management". >> > > >> > > We plan to provide a new type of state backend to support DFS as >> primary >> > > storage. >> > > To achieve this, we at least need to include two parts of amends (not >> > > entirely sure yet, since we are still in the designing and prototype >> > phase) >> > > >> > > 1. Statebackend Change >> > > 2. State Access Change >> > > >> > > Not all of the interfaces related are `@Internal`. Some of the >> interfaces >> > > like `StateBackend` is `@PublicEvolving` >> > > So, you are right in the sense that "Disaggregated State Management" >> > itself >> > > probably does not need to be a "Must Have" >> > > >> > > But I was hoping changes that related to public APIs can be finalized >> and >> > > merged in Flink 2.0 (I will fix the wiki accordingly). >> > > >> > > I also agree with Jark that 2.0 is a good chance to rework the default >> > > value of configurations. >> > > >> > > Best >> > > Yuan >> > > >> > > >> > > On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org> >> > wrote: >> > > >> > >> Something else configuration-related is that there are a bunch of >> > >> options where the type isn't quite correct (e.g., a String where it >> > >> could be an enum, a string where it should be an int or something). >> > >> Could do a pass over those as well. >> > >> >> > >> On 29/06/2023 13:50, Jark Wu wrote: >> > >>> Hi, >> > >>> >> > >>> I think one more thing we need to consider to do in 2.0 is changing >> the >> > >>> default value of configuration to improve out-of-box user >> experience. >> > >>> >> > >>> Currently, in order to run a Flink job, users may need to set >> > >>> a bunch of configurations, such as minibatch, checkpoint interval, >> > >>> exactly-once, >> > >>> incremental-checkpoint, etc. It's very verbose and hard to use for >> > >>> beginners. >> > >>> Most of them can have a universally applicable value. Because >> changing >> > >> the >> > >>> default value is a breaking change. I think It's worth considering >> > >> changing >> > >>> them in 2.0. >> > >>> >> > >>> What do you think? >> > >>> >> > >>> Best, >> > >>> Jark >> > >>> >> > >>> >> > >>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com> >> > >> wrote: >> > >>> >> > >>>> Hi Chesnay >> > >>>> >> > >>>>> "Move Calcite rules from Scala to Java": I would hope that this >> would >> > >> be >> > >>>>> an entirely internal change, and could thus be an incremental >> process >> > >>>>> independent of major releases. >> > >>>>> What is the actual scale of this item; how much are we actually >> > >>>> re-writing? >> > >>>> >> > >>>> Thanks for asking >> > >>>> yes, you're right, that should be internal change. >> > >>>> Yeah I was also thinking about incremental change (rule by rule or >> > >>>> reasonable small group of rules). >> > >>>> And yes, this could be an independent (on major release) activity >> > >>>> >> > >>>> The problem is actually for children of RelOptRule. >> > >>>> Currently I see 60+ such rules (in Scala) using the mentioned >> > deprecated >> > >>>> api. >> > >>>> There are also children of ConverterRule (50+) which do not have >> such >> > >>>> issues. >> > >>>> Maybe it could be considered as the next step to have all the >> rules in >> > >>>> Java. >> > >>>> >> > >>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song < >> tonysong...@gmail.com> >> > >>>> wrote: >> > >>>> >> > >>>>> Hi Alex & Gyula, >> > >>>>> >> > >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321: >> > >>>> Introduce >> > >>>>>> an API deprecation process" thread [1]? >> > >>>>>> >> > >>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the >> > wrong >> > >>>> url >> > >>>>> in my previous email. Sorry for the mistake. >> > >>>>> >> > >>>>> I am also curious to know if the rationale behind this new API has >> > been >> > >>>>>> previously discussed on the mailing list. Do we have a list of >> > >>>>> shortcomings >> > >>>>>> in the current DataStream API that it tries to resolve? How does >> the >> > >>>>>> current ProcessFunction functionality fit into the picture? Will >> it >> > be >> > >>>>> kept >> > >>>>>> as is or subsumed by new API? >> > >>>>>> >> > >>>>> I don't think we should create a replacement for the DataStream >> API >> > >>>> unless >> > >>>>>> we have a very good reason to do so and with a proper discussion >> > about >> > >>>>> this >> > >>>>>> as Alex said. >> > >>>>> >> > >>>>> The ProcessFunction API which is targeting to replace DataStream >> API >> > is >> > >>>>> still a proposal, not a decision. Sorry for the confusion, I >> should >> > >> have >> > >>>>> been more careful with my words, not giving the impression that >> this >> > is >> > >>>>> something we'll do anyway. >> > >>>>> >> > >>>>> There will be a FLIP describing the motivations and designs in >> > detail, >> > >>>> for >> > >>>>> the community to discuss and vote on. We are still working on it. >> > TBH, >> > >>>> this >> > >>>>> is not trivial and we would need more time on it. >> > >>>>> >> > >>>>> Just to quickly share some backgrounds: >> > >>>>> >> > >>>>> - We see quite some problems with the current DataStream APIs >> > >>>>> - Users are working with concrete classes rather than >> > >> interfaces, >> > >>>>> which means >> > >>>>> - Users can access methods that are designed to be used by >> > >> internal >> > >>>>> classes, even though they are annotated with `@Internal`. >> > >> E.g., >> > >>>>> `DataStream#getTransformation`. >> > >>>>> - Changes to the non-API implementations (e.g., >> > >>>> `Transformation`) >> > >>>>> would affect the API classes (e.g., `DataStream`), which >> > >>>>> makes it hard to >> > >>>>> provide binary compatibility. >> > >>>>> - Internal classes are used as parameter / return-value of >> > >> public >> > >>>>> APIs. E.g., while `AbstractStreamOperator` is >> PublicEvolving, >> > >>>>> `StreamTask` >> > >>>>> which returns from >> `AbstractStreamOperator#getContainingTask` >> > is >> > >>>>> Internal. >> > >>>>> - In many cases, users are asked to extend the API classes, >> > >> rather >> > >>>>> than implementing interfaces. E.g., >> `AbstractStreamOperator`. >> > >>>>> - Any changes to the base classes, even the internal >> part, >> > >> may >> > >>>>> affect the behavior of the user-provided sub-classes >> > >>>>> - Users can override the behavior of the base classes >> > >>>>> - The API module `flink-streaming-java` contains non-API >> > >> classes, >> > >>>> and >> > >>>>> depends on internal modules such as `flink-runtime`, which >> > means >> > >>>>> - Changes to the internal modules may affect the API >> modules, >> > >> which >> > >>>>> requires users to re-build their applications upon >> upgrading >> > >>>>> - The artifact user needs for building their application >> > >> larger >> > >>>>> than necessary. >> > >>>>> - We probably should not expose operators (e.g., >> > >>>>> `AbstractStreamOperator`) to users. Functions should be >> enough >> > >>>>> for users to >> > >>>>> define their data processing logics. Exposing operator-level >> > >>>> concepts >> > >>>>> (e.g., mailbox thread model, checkpoint barrier alignment, >> > >> etc.) is >> > >>>>> unnecessary and limits the improvement regarding such >> exposed >> > >>>>> mechanisms >> > >>>>> with compatibility considerations. >> > >>>>> - The current DataStream API seems to be a mixture of many >> > >> things, >> > >>>>> making it hard to understand especially for newcomers. It >> might >> > >> be >> > >>>>> better >> > >>>>> to re-organize it into several parts: (the taxonomy below >> are >> > >> just >> > >>>> an >> > >>>>> example of the, we are still working on this) >> > >>>>> - The most fundamental stateful stream processing: >> streams, >> > >>>>> partitions / key, process functions, state, >> timeline-service >> > >>>>> - An extension for common batch-streaming unified >> functions: >> > >>>> map, >> > >>>>> flatmap, filter, agg, reduce, join, etc. >> > >>>>> - An extension for windowing supports: window, >> triggering >> > >>>>> - An extension for event-time supports: event time, >> > watermark >> > >>>>> - The extensions are like short-cuts / sugars, without >> which >> > >>>> users >> > >>>>> can probably still achieve the same behavior by working >> with >> > >> the >> > >>>>> fundamental APIs, but would be a lot easier with the >> > >> extensions >> > >>>>> - The original plan was to do in-place refactors / changes >> on >> > >>>>> DataStream API. Some related items are listed in this doc [2] >> > >> attached >> > >>>>> to >> > >>>>> the kicking off email [3]. Not all of the above issues are >> listed, >> > >>>>> because >> > >>>>> we haven't looked into this as deeply as now by that time. >> > >>>>> - We proposed this as a new API rather than in-place refactors >> in >> > >> the >> > >>>>> 2.0 work item list, because we realized the changes might be >> too >> > >> big >> > >>>>> for an >> > >>>>> in-place change. First having a new API then gradually retiring >> > the >> > >>>> old >> > >>>>> one >> > >>>>> would help users to smoothly migrate between them. >> > >>>>> >> > >>>>> A thorough discussion is definitely needed once the FLIP is out. >> And >> > of >> > >>>>> course it's possible that the FLIP might be rejected. Given that >> we >> > are >> > >>>>> planning for release 2.0, I just feel it would be better to bring >> > this >> > >> up >> > >>>>> early even the concrete plan is not yet ready, >> > >>>>> >> > >>>>> Best, >> > >>>>> >> > >>>>> Xintong >> > >>>>> >> > >>>>> >> > >>>>> [1] >> https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9 >> > >>>>> [2] >> > >>>>> >> > >>>>> >> > >>>> >> > >> >> > >> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing >> > >>>>> [3] >> https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c >> > >>>>> >> > >>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> >> > wrote: >> > >>>>> >> > >>>>>> Hey! >> > >>>>>> >> > >>>>>> I share the same concerns mentioned above regarding the >> > >>>> "ProcessFunction >> > >>>>>> API". >> > >>>>>> >> > >>>>>> I don't think we should create a replacement for the DataStream >> API >> > >>>>> unless >> > >>>>>> we have a very good reason to do so and with a proper discussion >> > about >> > >>>>> this >> > >>>>>> as Alex said. >> > >>>>>> >> > >>>>>> Cheers, >> > >>>>>> Gyula >> > >>>>>> >> > >>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov < >> > >>>>>> alexander.fedu...@gmail.com> wrote: >> > >>>>>> >> > >>>>>>> Hi Xintong, >> > >>>>>>> >> > >>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321: >> > >>>>>> Introduce >> > >>>>>>> an API deprecation process" thread [1]? >> > >>>>>>> >> > >>>>>>> I am also curious to know if the rationale behind this new API >> has >> > >>>> been >> > >>>>>>> previously discussed on the mailing list. Do we have a list of >> > >>>>>> shortcomings >> > >>>>>>> in the current DataStream API that it tries to resolve? How does >> > the >> > >>>>>>> current ProcessFunction functionality fit into the picture? >> Will it >> > >>>> be >> > >>>>>> kept >> > >>>>>>> as is or subsumed by new API? >> > >>>>>>> >> > >>>>>>> [1] >> > https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9 >> > >>>>>>> >> > >>>>>>> Best, >> > >>>>>>> Alex >> > >>>>>>> >> > >>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song < >> tonysong...@gmail.com> >> > >>>>>> wrote: >> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches >> > >>>>> because >> > >>>>>>> it's >> > >>>>>>>>> very unclear what it actually entails; like is it an entirely >> > >>>>>> separate >> > >>>>>>>> API >> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of >> DataStream. >> > >>>>> How >> > >>>>>>>> much >> > >>>>>>>>> will it share the internals with DataStream etc.; how does it >> > >>>>> relate >> > >>>>>> to >> > >>>>>>>> the >> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses >> > >>>> underneath). >> > >>>>>>>> I totally understand your confusion. We started planning this >> > after >> > >>>>>>> kicking >> > >>>>>>>> off the release 2.0, so there's still a lot to be explored and >> the >> > >>>>> plan >> > >>>>>>>> keeps changing. >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> - In the beginning, we planned to do an in-place refactor of >> > >>>>>>> DataStream >> > >>>>>>>> API, until the API migration period is proposed. >> > >>>>>>>> - Then we want to make it an entirely separate API to >> > >>>> DataStream, >> > >>>>>> and >> > >>>>>>>> listed as a must-have for release 2.0 so that we can remove >> > >>>>>> DataStream >> > >>>>>>>> once >> > >>>>>>>> it's ready. >> > >>>>>>>> - However, depending on the outcome of the API compatibility >> > >>>>>>> discussion >> > >>>>>>>> [1], we may not be able to remove DataStream in 2.0 anyway, >> > >>>> which >> > >>>>>>> means >> > >>>>>>>> we >> > >>>>>>>> might need to re-evaluate the necessity of this item for >> 2.0. >> > >>>>>>>> >> > >>>>>>>> I'd say we wait a bit longer for the compatibility discussion >> [1] >> > >>>> and >> > >>>>>>>> decide the priority for this item afterwards. >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> Best, >> > >>>>>>>> >> > >>>>>>>> Xintong >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler < >> > >>>> ches...@apache.org >> > >>>>>>>> wrote: >> > >>>>>>>> >> > >>>>>>>>> by-and-large I'm quite happy with the list of items. >> > >>>>>>>>> >> > >>>>>>>>> I'm curious as to why the "Disaggregated State Management" >> item >> > >>>> is >> > >>>>>>> marked >> > >>>>>>>>> as a must-have; will it require changes that break something? >> > >>>> What >> > >>>>>>>> prevents >> > >>>>>>>>> it from being added in 2.1? >> > >>>>>>>>> >> > >>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the >> > >>>>> default, >> > >>>>>>> drop >> > >>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8" >> > >>>> and >> > >>>>> a >> > >>>>>>>>> nice-to-have "Drop Java 11"? >> > >>>>>>>>> >> > >>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that >> this >> > >>>>> would >> > >>>>>>> be >> > >>>>>>>>> an entirely internal change, and could thus be an incremental >> > >>>>> process >> > >>>>>>>>> independent of major releases. >> > >>>>>>>>> What is the actual scale of this item; how much are we >> actually >> > >>>>>>>> re-writing? >> > >>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a >> > >>>> must-have; i >> > >>>>>>> think >> > >>>>>>>>> I marked it down as nice-to-have only because it depends on >> > >>>> another >> > >>>>>>> item. >> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches >> > >>>>> because >> > >>>>>>> it's >> > >>>>>>>>> very unclear what it actually entails; like is it an entirely >> > >>>>>> separate >> > >>>>>>>> API >> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of >> DataStream. >> > >>>>> How >> > >>>>>>>> much >> > >>>>>>>>> will it share the internals with DataStream etc.; how does it >> > >>>>> relate >> > >>>>>> to >> > >>>>>>>> the >> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses >> > >>>> underneath). >> > >>>>>>>>> There are a few items I added as ideas which don't have a >> > >>>> priority >> > >>>>>> yet; >> > >>>>>>>>> would love to get some feedback on those. >> > >>>>>>>>> >> > >>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote: >> > >>>>>>>>> >> > >>>>>>>>> Hi devs, >> > >>>>>>>>> >> > >>>>>>>>> As previously discussed in [1], we had been collecting work >> item >> > >>>>>>>> proposals >> > >>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2]. >> > >>>>>>>>> >> > >>>>>>>>> - As we have passed the due date, I'd like to kindly remind >> > >>>>>> everyone >> > >>>>>>>> *not >> > >>>>>>>>> to add / remove items directly on the wiki page*. If >> needed, >> > >>>>>> please >> > >>>>>>>> post >> > >>>>>>>>> in this thread or reach out to the release managers >> instead. >> > >>>>>>>>> - I've reached out to some folks for clarifications about >> > >>>> their >> > >>>>>>>>> proposals. Some of them mentioned that they can not yet >> tell >> > >>>>>> whether >> > >>>>>>>> we >> > >>>>>>>>> should do an item or not, and would need more time / >> > >>>> discussions >> > >>>>>> to >> > >>>>>>>> make >> > >>>>>>>>> the decision. So I added a new symbol for items whose >> > >>>> priorities >> > >>>>>> are >> > >>>>>>>> `TBD`. >> > >>>>>>>>> Now it's time to collaboratively decide a minimum set of >> > >>>> must-have >> > >>>>>>> items. >> > >>>>>>>>> I've gone through the entire list of proposed items, and found >> > >>>> most >> > >>>>>> of >> > >>>>>>>> them >> > >>>>>>>>> make quite much sense. So I think an online sync might not be >> > >>>>>> necessary >> > >>>>>>>> for >> > >>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone >> can >> > >>>>>>> comment >> > >>>>>>>>> on how they think the list can be improved, followed by a >> VOTE to >> > >>>>>>>> formally >> > >>>>>>>>> make the decision. >> > >>>>>>>>> >> > >>>>>>>>> Any feedback and opinions, including but not limited to the >> > >>>>> following >> > >>>>>>>>> aspects, will be appreciated. >> > >>>>>>>>> >> > >>>>>>>>> - Important items that are missing from the list >> > >>>>>>>>> - Concerns regarding the listed items or their priorities >> > >>>>>>>>> >> > >>>>>>>>> Looking forward to your feedback. >> > >>>>>>>>> >> > >>>>>>>>> Best, >> > >>>>>>>>> >> > >>>>>>>>> Xintong >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> [1] >> > >>>> >> > >> >> > >> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates >> > >>>>>>>>> [2] >> > >>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>> >> > >>>> -- >> > >>>> Best regards, >> > >>>> Sergey >> > >>>> >> > >> >> > >> >> > >> > >> >> -- >> Best >> >> ConradJam >> >