Hi, Speaking of "Move Calcite rules from Scala to Java", I was wondering if this thread is the right place to talk about it. Afaik, the Flink community has decided to abandon Scala. That is the reason, I guess, we want to move those Calcite rules from Scala to Java. On the other side, new Scala code will be added while developing new features[1]. Do we have any thoughts wrt the Scala code strategy?
Best regards, Jing [1] https://lists.apache.org/thread/tnygl4n3q1fx75cl2vclc78j8mrxmz6y On Mon, Jul 3, 2023 at 10:31 AM Xintong Song <tonysong...@gmail.com> wrote: > Thanks all for the discussion. > > > IIUC, we need to make the following changes. Please correct me if I get it > wrong. > > > 1. Disaggregated State Management - Clarify that only the public API > related part is must-have for 2.0. > > 2. Java version support - Split it into 3 items: a) make java 17 the > default (must-have), b) drop java 8 (must-have), and c) drop java 11 > (nice-to-have) > > 3. Add MetricGroup#getLogicalScope - Should be promoted to must-have > > 4. ProcessFunction API - Should be downgrade to nice-to-have > > 5. Configuration - Add an item "revisit all config option types and default > values", which IIUC should also be a must-have > > > There seems to be no changes needed for "Move Calcite rules from Scala to > Java" as it's already nice-to-have. > > > If there's no objections, I'll update the wiki page accordingly, and start > a VOTE in the next couple of days. > > > Best, > > Xintong > > > > On Fri, Jun 30, 2023 at 12:53 AM Teoh, Hong <lian...@amazon.co.uk.invalid> > wrote: > > > Thanks Xintong for driving the effort. > > > > I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay, > > especially the types. We have various configs that encode Time / > MemorySize > > that are Long instead! > > > > Regards, > > Hong > > > > > > > > > On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote: > > > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and > know > > the content is safe. > > > > > > > > > > > > Thanks for driving this effort, Xintong! > > > > > > To Chesnay > > >> I'm curious as to why the "Disaggregated State Management" item is > > >> marked as a must-have; will it require changes that break something? > > >> What prevents it from being added in 2.1? > > > > > > As to "Disaggregated State Management". > > > > > > We plan to provide a new type of state backend to support DFS as > primary > > > storage. > > > To achieve this, we at least need to include two parts of amends (not > > > entirely sure yet, since we are still in the designing and prototype > > phase) > > > > > > 1. Statebackend Change > > > 2. State Access Change > > > > > > Not all of the interfaces related are `@Internal`. Some of the > interfaces > > > like `StateBackend` is `@PublicEvolving` > > > So, you are right in the sense that "Disaggregated State Management" > > itself > > > probably does not need to be a "Must Have" > > > > > > But I was hoping changes that related to public APIs can be finalized > and > > > merged in Flink 2.0 (I will fix the wiki accordingly). > > > > > > I also agree with Jark that 2.0 is a good chance to rework the default > > > value of configurations. > > > > > > Best > > > Yuan > > > > > > > > > On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > >> Something else configuration-related is that there are a bunch of > > >> options where the type isn't quite correct (e.g., a String where it > > >> could be an enum, a string where it should be an int or something). > > >> Could do a pass over those as well. > > >> > > >> On 29/06/2023 13:50, Jark Wu wrote: > > >>> Hi, > > >>> > > >>> I think one more thing we need to consider to do in 2.0 is changing > the > > >>> default value of configuration to improve out-of-box user experience. > > >>> > > >>> Currently, in order to run a Flink job, users may need to set > > >>> a bunch of configurations, such as minibatch, checkpoint interval, > > >>> exactly-once, > > >>> incremental-checkpoint, etc. It's very verbose and hard to use for > > >>> beginners. > > >>> Most of them can have a universally applicable value. Because > changing > > >> the > > >>> default value is a breaking change. I think It's worth considering > > >> changing > > >>> them in 2.0. > > >>> > > >>> What do you think? > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> > > >>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com> > > >> wrote: > > >>> > > >>>> Hi Chesnay > > >>>> > > >>>>> "Move Calcite rules from Scala to Java": I would hope that this > would > > >> be > > >>>>> an entirely internal change, and could thus be an incremental > process > > >>>>> independent of major releases. > > >>>>> What is the actual scale of this item; how much are we actually > > >>>> re-writing? > > >>>> > > >>>> Thanks for asking > > >>>> yes, you're right, that should be internal change. > > >>>> Yeah I was also thinking about incremental change (rule by rule or > > >>>> reasonable small group of rules). > > >>>> And yes, this could be an independent (on major release) activity > > >>>> > > >>>> The problem is actually for children of RelOptRule. > > >>>> Currently I see 60+ such rules (in Scala) using the mentioned > > deprecated > > >>>> api. > > >>>> There are also children of ConverterRule (50+) which do not have > such > > >>>> issues. > > >>>> Maybe it could be considered as the next step to have all the rules > in > > >>>> Java. > > >>>> > > >>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com > > > > >>>> wrote: > > >>>> > > >>>>> Hi Alex & Gyula, > > >>>>> > > >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321: > > >>>> Introduce > > >>>>>> an API deprecation process" thread [1]? > > >>>>>> > > >>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the > > wrong > > >>>> url > > >>>>> in my previous email. Sorry for the mistake. > > >>>>> > > >>>>> I am also curious to know if the rationale behind this new API has > > been > > >>>>>> previously discussed on the mailing list. Do we have a list of > > >>>>> shortcomings > > >>>>>> in the current DataStream API that it tries to resolve? How does > the > > >>>>>> current ProcessFunction functionality fit into the picture? Will > it > > be > > >>>>> kept > > >>>>>> as is or subsumed by new API? > > >>>>>> > > >>>>> I don't think we should create a replacement for the DataStream API > > >>>> unless > > >>>>>> we have a very good reason to do so and with a proper discussion > > about > > >>>>> this > > >>>>>> as Alex said. > > >>>>> > > >>>>> The ProcessFunction API which is targeting to replace DataStream > API > > is > > >>>>> still a proposal, not a decision. Sorry for the confusion, I should > > >> have > > >>>>> been more careful with my words, not giving the impression that > this > > is > > >>>>> something we'll do anyway. > > >>>>> > > >>>>> There will be a FLIP describing the motivations and designs in > > detail, > > >>>> for > > >>>>> the community to discuss and vote on. We are still working on it. > > TBH, > > >>>> this > > >>>>> is not trivial and we would need more time on it. > > >>>>> > > >>>>> Just to quickly share some backgrounds: > > >>>>> > > >>>>> - We see quite some problems with the current DataStream APIs > > >>>>> - Users are working with concrete classes rather than > > >> interfaces, > > >>>>> which means > > >>>>> - Users can access methods that are designed to be used by > > >> internal > > >>>>> classes, even though they are annotated with `@Internal`. > > >> E.g., > > >>>>> `DataStream#getTransformation`. > > >>>>> - Changes to the non-API implementations (e.g., > > >>>> `Transformation`) > > >>>>> would affect the API classes (e.g., `DataStream`), which > > >>>>> makes it hard to > > >>>>> provide binary compatibility. > > >>>>> - Internal classes are used as parameter / return-value of > > >> public > > >>>>> APIs. E.g., while `AbstractStreamOperator` is PublicEvolving, > > >>>>> `StreamTask` > > >>>>> which returns from `AbstractStreamOperator#getContainingTask` > > is > > >>>>> Internal. > > >>>>> - In many cases, users are asked to extend the API classes, > > >> rather > > >>>>> than implementing interfaces. E.g., `AbstractStreamOperator`. > > >>>>> - Any changes to the base classes, even the internal part, > > >> may > > >>>>> affect the behavior of the user-provided sub-classes > > >>>>> - Users can override the behavior of the base classes > > >>>>> - The API module `flink-streaming-java` contains non-API > > >> classes, > > >>>> and > > >>>>> depends on internal modules such as `flink-runtime`, which > > means > > >>>>> - Changes to the internal modules may affect the API modules, > > >> which > > >>>>> requires users to re-build their applications upon > upgrading > > >>>>> - The artifact user needs for building their application > > >> larger > > >>>>> than necessary. > > >>>>> - We probably should not expose operators (e.g., > > >>>>> `AbstractStreamOperator`) to users. Functions should be > enough > > >>>>> for users to > > >>>>> define their data processing logics. Exposing operator-level > > >>>> concepts > > >>>>> (e.g., mailbox thread model, checkpoint barrier alignment, > > >> etc.) is > > >>>>> unnecessary and limits the improvement regarding such exposed > > >>>>> mechanisms > > >>>>> with compatibility considerations. > > >>>>> - The current DataStream API seems to be a mixture of many > > >> things, > > >>>>> making it hard to understand especially for newcomers. It > might > > >> be > > >>>>> better > > >>>>> to re-organize it into several parts: (the taxonomy below are > > >> just > > >>>> an > > >>>>> example of the, we are still working on this) > > >>>>> - The most fundamental stateful stream processing: > streams, > > >>>>> partitions / key, process functions, state, > timeline-service > > >>>>> - An extension for common batch-streaming unified > functions: > > >>>> map, > > >>>>> flatmap, filter, agg, reduce, join, etc. > > >>>>> - An extension for windowing supports: window, triggering > > >>>>> - An extension for event-time supports: event time, > > watermark > > >>>>> - The extensions are like short-cuts / sugars, without > which > > >>>> users > > >>>>> can probably still achieve the same behavior by working > with > > >> the > > >>>>> fundamental APIs, but would be a lot easier with the > > >> extensions > > >>>>> - The original plan was to do in-place refactors / changes on > > >>>>> DataStream API. Some related items are listed in this doc [2] > > >> attached > > >>>>> to > > >>>>> the kicking off email [3]. Not all of the above issues are > listed, > > >>>>> because > > >>>>> we haven't looked into this as deeply as now by that time. > > >>>>> - We proposed this as a new API rather than in-place refactors > in > > >> the > > >>>>> 2.0 work item list, because we realized the changes might be too > > >> big > > >>>>> for an > > >>>>> in-place change. First having a new API then gradually retiring > > the > > >>>> old > > >>>>> one > > >>>>> would help users to smoothly migrate between them. > > >>>>> > > >>>>> A thorough discussion is definitely needed once the FLIP is out. > And > > of > > >>>>> course it's possible that the FLIP might be rejected. Given that we > > are > > >>>>> planning for release 2.0, I just feel it would be better to bring > > this > > >> up > > >>>>> early even the concrete plan is not yet ready, > > >>>>> > > >>>>> Best, > > >>>>> > > >>>>> Xintong > > >>>>> > > >>>>> > > >>>>> [1] > https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9 > > >>>>> [2] > > >>>>> > > >>>>> > > >>>> > > >> > > > https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing > > >>>>> [3] > https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c > > >>>>> > > >>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> > > wrote: > > >>>>> > > >>>>>> Hey! > > >>>>>> > > >>>>>> I share the same concerns mentioned above regarding the > > >>>> "ProcessFunction > > >>>>>> API". > > >>>>>> > > >>>>>> I don't think we should create a replacement for the DataStream > API > > >>>>> unless > > >>>>>> we have a very good reason to do so and with a proper discussion > > about > > >>>>> this > > >>>>>> as Alex said. > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Gyula > > >>>>>> > > >>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov < > > >>>>>> alexander.fedu...@gmail.com> wrote: > > >>>>>> > > >>>>>>> Hi Xintong, > > >>>>>>> > > >>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321: > > >>>>>> Introduce > > >>>>>>> an API deprecation process" thread [1]? > > >>>>>>> > > >>>>>>> I am also curious to know if the rationale behind this new API > has > > >>>> been > > >>>>>>> previously discussed on the mailing list. Do we have a list of > > >>>>>> shortcomings > > >>>>>>> in the current DataStream API that it tries to resolve? How does > > the > > >>>>>>> current ProcessFunction functionality fit into the picture? Will > it > > >>>> be > > >>>>>> kept > > >>>>>>> as is or subsumed by new API? > > >>>>>>> > > >>>>>>> [1] > > https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9 > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Alex > > >>>>>>> > > >>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song < > tonysong...@gmail.com> > > >>>>>> wrote: > > >>>>>>>>> The ProcessFunction API item is giving me the most headaches > > >>>>> because > > >>>>>>> it's > > >>>>>>>>> very unclear what it actually entails; like is it an entirely > > >>>>>> separate > > >>>>>>>> API > > >>>>>>>>> to DataStream (sounds like it is!) or an extension of > DataStream. > > >>>>> How > > >>>>>>>> much > > >>>>>>>>> will it share the internals with DataStream etc.; how does it > > >>>>> relate > > >>>>>> to > > >>>>>>>> the > > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses > > >>>> underneath). > > >>>>>>>> I totally understand your confusion. We started planning this > > after > > >>>>>>> kicking > > >>>>>>>> off the release 2.0, so there's still a lot to be explored and > the > > >>>>> plan > > >>>>>>>> keeps changing. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> - In the beginning, we planned to do an in-place refactor of > > >>>>>>> DataStream > > >>>>>>>> API, until the API migration period is proposed. > > >>>>>>>> - Then we want to make it an entirely separate API to > > >>>> DataStream, > > >>>>>> and > > >>>>>>>> listed as a must-have for release 2.0 so that we can remove > > >>>>>> DataStream > > >>>>>>>> once > > >>>>>>>> it's ready. > > >>>>>>>> - However, depending on the outcome of the API compatibility > > >>>>>>> discussion > > >>>>>>>> [1], we may not be able to remove DataStream in 2.0 anyway, > > >>>> which > > >>>>>>> means > > >>>>>>>> we > > >>>>>>>> might need to re-evaluate the necessity of this item for 2.0. > > >>>>>>>> > > >>>>>>>> I'd say we wait a bit longer for the compatibility discussion > [1] > > >>>> and > > >>>>>>>> decide the priority for this item afterwards. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> > > >>>>>>>> Xintong > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler < > > >>>> ches...@apache.org > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> by-and-large I'm quite happy with the list of items. > > >>>>>>>>> > > >>>>>>>>> I'm curious as to why the "Disaggregated State Management" item > > >>>> is > > >>>>>>> marked > > >>>>>>>>> as a must-have; will it require changes that break something? > > >>>> What > > >>>>>>>> prevents > > >>>>>>>>> it from being added in 2.1? > > >>>>>>>>> > > >>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the > > >>>>> default, > > >>>>>>> drop > > >>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8" > > >>>> and > > >>>>> a > > >>>>>>>>> nice-to-have "Drop Java 11"? > > >>>>>>>>> > > >>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that this > > >>>>> would > > >>>>>>> be > > >>>>>>>>> an entirely internal change, and could thus be an incremental > > >>>>> process > > >>>>>>>>> independent of major releases. > > >>>>>>>>> What is the actual scale of this item; how much are we actually > > >>>>>>>> re-writing? > > >>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a > > >>>> must-have; i > > >>>>>>> think > > >>>>>>>>> I marked it down as nice-to-have only because it depends on > > >>>> another > > >>>>>>> item. > > >>>>>>>>> The ProcessFunction API item is giving me the most headaches > > >>>>> because > > >>>>>>> it's > > >>>>>>>>> very unclear what it actually entails; like is it an entirely > > >>>>>> separate > > >>>>>>>> API > > >>>>>>>>> to DataStream (sounds like it is!) or an extension of > DataStream. > > >>>>> How > > >>>>>>>> much > > >>>>>>>>> will it share the internals with DataStream etc.; how does it > > >>>>> relate > > >>>>>> to > > >>>>>>>> the > > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses > > >>>> underneath). > > >>>>>>>>> There are a few items I added as ideas which don't have a > > >>>> priority > > >>>>>> yet; > > >>>>>>>>> would love to get some feedback on those. > > >>>>>>>>> > > >>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote: > > >>>>>>>>> > > >>>>>>>>> Hi devs, > > >>>>>>>>> > > >>>>>>>>> As previously discussed in [1], we had been collecting work > item > > >>>>>>>> proposals > > >>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2]. > > >>>>>>>>> > > >>>>>>>>> - As we have passed the due date, I'd like to kindly remind > > >>>>>> everyone > > >>>>>>>> *not > > >>>>>>>>> to add / remove items directly on the wiki page*. If needed, > > >>>>>> please > > >>>>>>>> post > > >>>>>>>>> in this thread or reach out to the release managers instead. > > >>>>>>>>> - I've reached out to some folks for clarifications about > > >>>> their > > >>>>>>>>> proposals. Some of them mentioned that they can not yet tell > > >>>>>> whether > > >>>>>>>> we > > >>>>>>>>> should do an item or not, and would need more time / > > >>>> discussions > > >>>>>> to > > >>>>>>>> make > > >>>>>>>>> the decision. So I added a new symbol for items whose > > >>>> priorities > > >>>>>> are > > >>>>>>>> `TBD`. > > >>>>>>>>> Now it's time to collaboratively decide a minimum set of > > >>>> must-have > > >>>>>>> items. > > >>>>>>>>> I've gone through the entire list of proposed items, and found > > >>>> most > > >>>>>> of > > >>>>>>>> them > > >>>>>>>>> make quite much sense. So I think an online sync might not be > > >>>>>> necessary > > >>>>>>>> for > > >>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone > can > > >>>>>>> comment > > >>>>>>>>> on how they think the list can be improved, followed by a VOTE > to > > >>>>>>>> formally > > >>>>>>>>> make the decision. > > >>>>>>>>> > > >>>>>>>>> Any feedback and opinions, including but not limited to the > > >>>>> following > > >>>>>>>>> aspects, will be appreciated. > > >>>>>>>>> > > >>>>>>>>> - Important items that are missing from the list > > >>>>>>>>> - Concerns regarding the listed items or their priorities > > >>>>>>>>> > > >>>>>>>>> Looking forward to your feedback. > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> > > >>>>>>>>> Xintong > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> [1] > > >>>> > > >> > > > https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates > > >>>>>>>>> [2] > > >>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release > > >>>>>>>>> > > >>>>>>>>> > > >>>> > > >>>> -- > > >>>> Best regards, > > >>>> Sergey > > >>>> > > >> > > >> > > > > >