Thanks for driving this effort, Xintong!

To Chesnay
> I'm curious as to why the "Disaggregated State Management" item is
> marked as a must-have; will it require changes that break something?
> What prevents it from being added in 2.1?

As to "Disaggregated State Management".

We plan to provide a new type of state backend to support DFS as primary
storage.
To achieve this, we at least need to include two parts of amends (not
entirely sure yet, since we are still in the designing and prototype phase)

1. Statebackend Change
2. State Access Change

Not all of the interfaces related are `@Internal`. Some of the interfaces
like `StateBackend` is `@PublicEvolving`
So, you are right in the sense that "Disaggregated State Management" itself
probably does not need to be a "Must Have"

But I was hoping changes that related to public APIs can be finalized and
merged in Flink 2.0 (I will fix the wiki accordingly).

I also agree with Jark that 2.0 is a good chance to rework the default
value of configurations.

Best
Yuan


On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org> wrote:

> Something else configuration-related is that there are a bunch of
> options where the type isn't quite correct (e.g., a String where it
> could be an enum, a string where it should be an int or something).
> Could do a pass over those as well.
>
> On 29/06/2023 13:50, Jark Wu wrote:
> > Hi,
> >
> > I think one more thing we need to consider to do in 2.0 is changing the
> > default value of configuration to improve out-of-box user experience.
> >
> > Currently, in order to run a Flink job, users may need to set
> > a bunch of configurations, such as minibatch, checkpoint interval,
> > exactly-once,
> > incremental-checkpoint, etc. It's very verbose and hard to use for
> > beginners.
> > Most of them can have a universally applicable value.  Because changing
> the
> > default value is a breaking change. I think It's worth considering
> changing
> > them in 2.0.
> >
> > What do you think?
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com>
> wrote:
> >
> >> Hi Chesnay
> >>
> >>> "Move Calcite rules from Scala to Java": I would hope that this would
> be
> >>> an entirely internal change, and could thus be an incremental process
> >>> independent of major releases.
> >>> What is the actual scale of this item; how much are we actually
> >> re-writing?
> >>
> >> Thanks for asking
> >> yes, you're right, that should be internal change.
> >> Yeah I was also thinking about incremental change (rule by rule or
> >> reasonable small group of rules).
> >> And yes, this could be an independent (on major release) activity
> >>
> >> The problem is actually for children of RelOptRule.
> >> Currently I see 60+ such rules (in Scala) using the mentioned deprecated
> >> api.
> >> There are also children of ConverterRule (50+) which do not have such
> >> issues.
> >> Maybe it could be considered as the next step to have all the rules in
> >> Java.
> >>
> >> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com>
> >> wrote:
> >>
> >>> Hi Alex & Gyula,
> >>>
> >>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> >> Introduce
> >>>> an API deprecation process" thread [1]?
> >>>>
> >>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the wrong
> >> url
> >>> in my previous email. Sorry for the mistake.
> >>>
> >>> I am also curious to know if the rationale behind this new API has been
> >>>> previously discussed on the mailing list. Do we have a list of
> >>> shortcomings
> >>>> in the current DataStream API that it tries to resolve? How does the
> >>>> current ProcessFunction functionality fit into the picture? Will it be
> >>> kept
> >>>> as is or subsumed by new API?
> >>>>
> >>> I don't think we should create a replacement for the DataStream API
> >> unless
> >>>> we have a very good reason to do so and with a proper discussion about
> >>> this
> >>>> as Alex said.
> >>>
> >>> The ProcessFunction API which is targeting to replace DataStream API is
> >>> still a proposal, not a decision. Sorry for the confusion, I should
> have
> >>> been more careful with my words, not giving the impression that this is
> >>> something we'll do anyway.
> >>>
> >>> There will be a FLIP describing the motivations and designs in detail,
> >> for
> >>> the community to discuss and vote on. We are still working on it. TBH,
> >> this
> >>> is not trivial and we would need more time on it.
> >>>
> >>> Just to quickly share some backgrounds:
> >>>
> >>>     - We see quite some problems with the current DataStream APIs
> >>>        - Users are working with concrete classes rather than
> interfaces,
> >>>        which means
> >>>        - Users can access methods that are designed to be used by
> internal
> >>>           classes, even though they are annotated with `@Internal`.
> E.g.,
> >>>           `DataStream#getTransformation`.
> >>>           - Changes to the non-API implementations (e.g.,
> >> `Transformation`)
> >>>           would affect the API classes (e.g., `DataStream`), which
> >>> makes it hard to
> >>>           provide binary compatibility.
> >>>        - Internal classes are used as parameter / return-value of
> public
> >>>        APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
> >>> `StreamTask`
> >>>        which returns from `AbstractStreamOperator#getContainingTask` is
> >>> Internal.
> >>>        - In many cases, users are asked to extend the API classes,
> rather
> >>>        than implementing interfaces. E.g., `AbstractStreamOperator`.
> >>>           - Any changes to the base classes, even the internal part,
> may
> >>>           affect the behavior of the user-provided sub-classes
> >>>           - Users can override the behavior of the base classes
> >>>        - The API module `flink-streaming-java` contains non-API
> classes,
> >> and
> >>>        depends on internal modules such as `flink-runtime`, which means
> >>>        - Changes to the internal modules may affect the API modules,
> which
> >>>           requires users to re-build their applications upon upgrading
> >>>           - The artifact user needs for building their application
> larger
> >>>           than necessary.
> >>>        - We probably should not expose operators (e.g.,
> >>>        `AbstractStreamOperator`) to users. Functions should be enough
> >>> for users to
> >>>        define their data processing logics. Exposing operator-level
> >> concepts
> >>>        (e.g., mailbox thread model, checkpoint barrier alignment,
> etc.) is
> >>>        unnecessary and limits the improvement regarding such exposed
> >>> mechanisms
> >>>        with compatibility considerations.
> >>>        - The current DataStream API seems to be a mixture of many
> things,
> >>>        making it hard to understand especially for newcomers. It might
> be
> >>> better
> >>>        to re-organize it into several parts: (the taxonomy below are
> just
> >> an
> >>>        example of the, we are still working on this)
> >>>           - The most fundamental stateful stream processing: streams,
> >>>           partitions / key, process functions, state, timeline-service
> >>>           - An extension for common batch-streaming unified functions:
> >> map,
> >>>           flatmap, filter, agg, reduce, join, etc.
> >>>           - An extension for windowing supports:  window, triggering
> >>>           - An extension for event-time supports: event time, watermark
> >>>           - The extensions are like short-cuts / sugars, without which
> >> users
> >>>           can probably still achieve the same behavior by working with
> the
> >>>           fundamental APIs, but would be a lot easier with the
> extensions
> >>>        - The original plan was to do in-place refactors / changes on
> >>>     DataStream API. Some related items are listed in this doc [2]
> attached
> >>> to
> >>>     the kicking off email [3]. Not all of the above issues are listed,
> >>> because
> >>>     we haven't looked into this as deeply as now  by that time.
> >>>     - We proposed this as a new API rather than in-place refactors in
> the
> >>>     2.0 work item list, because we realized the changes might be too
> big
> >>> for an
> >>>     in-place change. First having a new API then gradually retiring the
> >> old
> >>> one
> >>>     would help users to smoothly migrate between them.
> >>>
> >>> A thorough discussion is definitely needed once the FLIP is out. And of
> >>> course it's possible that the FLIP might be rejected. Given that we are
> >>> planning for release 2.0, I just feel it would be better to bring this
> up
> >>> early even the concrete plan is not yet ready,
> >>>
> >>> Best,
> >>>
> >>> Xintong
> >>>
> >>>
> >>> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> >>> [2]
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
> >>> [3] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
> >>>
> >>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> wrote:
> >>>
> >>>> Hey!
> >>>>
> >>>> I share the same concerns mentioned above regarding the
> >> "ProcessFunction
> >>>> API".
> >>>>
> >>>> I don't think we should create a replacement for the DataStream API
> >>> unless
> >>>> we have a very good reason to do so and with a proper discussion about
> >>> this
> >>>> as Alex said.
> >>>>
> >>>> Cheers,
> >>>> Gyula
> >>>>
> >>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
> >>>> alexander.fedu...@gmail.com> wrote:
> >>>>
> >>>>> Hi Xintong,
> >>>>>
> >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> >>>> Introduce
> >>>>> an API deprecation process" thread [1]?
> >>>>>
> >>>>> I am also curious to know if the rationale behind this new API has
> >> been
> >>>>> previously discussed on the mailing list. Do we have a list of
> >>>> shortcomings
> >>>>> in the current DataStream API that it tries to resolve? How does the
> >>>>> current ProcessFunction functionality fit into the picture? Will it
> >> be
> >>>> kept
> >>>>> as is or subsumed by new API?
> >>>>>
> >>>>> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> >>>>>
> >>>>> Best,
> >>>>> Alex
> >>>>>
> >>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song <tonysong...@gmail.com>
> >>>> wrote:
> >>>>>>> The ProcessFunction API item is giving me the most headaches
> >>> because
> >>>>> it's
> >>>>>>> very unclear what it actually entails; like is it an entirely
> >>>> separate
> >>>>>> API
> >>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
> >>> How
> >>>>>> much
> >>>>>>> will it share the internals with DataStream etc.; how does it
> >>> relate
> >>>> to
> >>>>>> the
> >>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> >> underneath).
> >>>>>> I totally understand your confusion. We started planning this after
> >>>>> kicking
> >>>>>> off the release 2.0, so there's still a lot to be explored and the
> >>> plan
> >>>>>> keeps changing.
> >>>>>>
> >>>>>>
> >>>>>>     - In the beginning, we planned to do an in-place refactor of
> >>>>> DataStream
> >>>>>>     API, until the API migration period is proposed.
> >>>>>>     - Then we want to make it an entirely separate API to
> >> DataStream,
> >>>> and
> >>>>>>     listed as a must-have for release 2.0 so that we can remove
> >>>> DataStream
> >>>>>> once
> >>>>>>     it's ready.
> >>>>>>     - However, depending on the outcome of the API compatibility
> >>>>> discussion
> >>>>>>     [1], we may not be able to remove DataStream in 2.0 anyway,
> >> which
> >>>>> means
> >>>>>> we
> >>>>>>     might need to re-evaluate the necessity of this item for 2.0.
> >>>>>>
> >>>>>> I'd say we wait a bit longer for the compatibility discussion [1]
> >> and
> >>>>>> decide the priority for this item afterwards.
> >>>>>>
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Xintong
> >>>>>>
> >>>>>>
> >>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
> >> ches...@apache.org
> >>>>>> wrote:
> >>>>>>
> >>>>>>> by-and-large I'm quite happy with the list of items.
> >>>>>>>
> >>>>>>> I'm curious as to why the "Disaggregated State Management" item
> >> is
> >>>>> marked
> >>>>>>> as a must-have; will it require changes that break something?
> >> What
> >>>>>> prevents
> >>>>>>> it from being added in 2.1?
> >>>>>>>
> >>>>>>> We may want to update the Java 17 item to "Make Java 17 the
> >>> default,
> >>>>> drop
> >>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8"
> >> and
> >>> a
> >>>>>>> nice-to-have "Drop Java 11"?
> >>>>>>>
> >>>>>>> "Move Calcite rules from Scala to Java": I would hope that this
> >>> would
> >>>>> be
> >>>>>>> an entirely internal change, and could thus be an incremental
> >>> process
> >>>>>>> independent of major releases.
> >>>>>>> What is the actual scale of this item; how much are we actually
> >>>>>> re-writing?
> >>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a
> >> must-have; i
> >>>>> think
> >>>>>>> I marked it down as nice-to-have only because it depends on
> >> another
> >>>>> item.
> >>>>>>> The ProcessFunction API item is giving me the most headaches
> >>> because
> >>>>> it's
> >>>>>>> very unclear what it actually entails; like is it an entirely
> >>>> separate
> >>>>>> API
> >>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
> >>> How
> >>>>>> much
> >>>>>>> will it share the internals with DataStream etc.; how does it
> >>> relate
> >>>> to
> >>>>>> the
> >>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> >> underneath).
> >>>>>>> There are a few items I added as ideas which don't have a
> >> priority
> >>>> yet;
> >>>>>>> would love to get some feedback on those.
> >>>>>>>
> >>>>>>> On 21/06/2023 08:41, Xintong Song wrote:
> >>>>>>>
> >>>>>>> Hi devs,
> >>>>>>>
> >>>>>>> As previously discussed in [1], we had been collecting work item
> >>>>>> proposals
> >>>>>>> for the 2.0 release until June 15th, on the wiki page [2].
> >>>>>>>
> >>>>>>>     - As we have passed the due date, I'd like to kindly remind
> >>>> everyone
> >>>>>> *not
> >>>>>>>     to add / remove items directly on the wiki page*. If needed,
> >>>> please
> >>>>>> post
> >>>>>>>     in this thread or reach out to the release managers instead.
> >>>>>>>     - I've reached out to some folks for clarifications about
> >> their
> >>>>>>>     proposals. Some of them mentioned that they can not yet tell
> >>>> whether
> >>>>>> we
> >>>>>>>     should do an item or not, and would need more time /
> >> discussions
> >>>> to
> >>>>>> make
> >>>>>>>     the decision. So I added a new symbol for items whose
> >> priorities
> >>>> are
> >>>>>> `TBD`.
> >>>>>>> Now it's time to collaboratively decide a minimum set of
> >> must-have
> >>>>> items.
> >>>>>>> I've gone through the entire list of proposed items, and found
> >> most
> >>>> of
> >>>>>> them
> >>>>>>> make quite much sense. So I think an online sync might not be
> >>>> necessary
> >>>>>> for
> >>>>>>> this. I'd like to go with this DISCUSS thread, where everyone can
> >>>>> comment
> >>>>>>> on how they think the list can be improved, followed by a VOTE to
> >>>>>> formally
> >>>>>>> make the decision.
> >>>>>>>
> >>>>>>> Any feedback and opinions, including but not limited to the
> >>> following
> >>>>>>> aspects, will be appreciated.
> >>>>>>>
> >>>>>>>     - Important items that are missing from the list
> >>>>>>>     - Concerns regarding the listed items or their priorities
> >>>>>>>
> >>>>>>> Looking forward to your feedback.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Xintong
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>
> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
> >>>>>>> [2]
> >> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> >>>>>>>
> >>>>>>>
> >>
> >> --
> >> Best regards,
> >> Sergey
> >>
>
>

Reply via email to