Nice to see this discussion!

If we make Flink upgrades easy, why would users not want to upgrade? I
think most hesitation today stems from the fact that X.Y.0 releases
tend to break downstream in one way or the other due to unexpected
upstream changes. If so, it would be nice if we could address that
problem. For sure, if there was a choice between expending finite
resources on supporting more releases vs. reducing compatibility
issues in the next release, I would vote for the latter.

There is another reason why some users are stuck on old releases that
cannot be solved by the community: Heavily customized forks of Flink
are hard to maintain and naturally defer upgrade as much as possible.
I believe the solution to that is in user land (and not a community
burden): contribute changes back to Flink, participate in the
community to ensure robust extension hooks are provided etc.

Cheers,
Thomas

On Mon, Nov 15, 2021 at 3:03 AM Till Rohrmann <trohrm...@apache.org> wrote:
>
> Hi everyone,
>
> I want to second Stephan's points. I think that supporting LTS versions
> rather fights symptoms than addressing the underlying problem that are
> incompatible/behaviour changing changes between versions. I believe if we
> can address this problem, then our Flink users will probably upgrade more
> frequently.
>
> Making upgrades as smooth as possible won't probably help all our users but
> we as a community also have to think about the costs of maintaining more
> Flink versions/supporting certain versions for a longer period of time. One
> thing is that we would need more complex processes that ensure that older
> Flink versions/LTS receives all the required bug fixes. I think this is
> already hard enough when only supporting the last two releases. Moreover,
> according to my experience, backporting fixes is usually not a problem if
> there are no merge conflicts. However, the older the Flink code is, the
> more likely it is that there are merge conflicts. In the worst case, issues
> need to be fixed in a completely different way than they are solved in
> master.
>
> Additionally, we have to consider that longer support periods will mean
> that we need to maintain our CI infrastructure and tooling the same period
> as well. To give you an example, we probably would still have to operate
> Travis if we had a LTS version. Given our existing problems with CI this
> would add a lot more problems to this heap. Moreover, supporting more
> versions would add more load to our CI infrastructure.
>
> All in all, I would expect that longer support periods would slow us down
> with other things. One of these things could be to make upgrades as smooth
> as possible so that more Flink users upgrade more frequently.
>
> @Tison: Concerning the problem that an older version contains a bug fix
> that is not contained yet in the latest release of a newer version, I think
> this is a valid problem. Right now, our users will have to check via JIRA
> whether their problems are solved in the latest version of the new release
> (e.g. when upgrading from 1.13.y to 1.14.x). Faster and more frequent
> releases could mitigate the problem. Faster and lower overhead releases
> could also allow us to release all versions in sync.
>
> Cheers,
> Till
>
> On Sat, Nov 13, 2021 at 1:40 AM tison <wander4...@gmail.com> wrote:
>
> > Hi Stephan,
> >
> > Thanks for your email! I agree your points about compatibility and the
> > upgrade experience should be smooth.
> >
> > The problem raised by Pitor is that, even if we officially support two
> > latest versions, many users stay in an early version end-of-support.
> > So the downside "no one ends up using the other versions" doesn't
> > stand. I see that a number of companies like to test the latest version
> > but not apply on prod if it's not robust - 1.9 & 1.10 is less used.
> >
> > If a non-LTS version resolves users' (urgent) issues and the release itself
> > is robust, they will upgrade to that version. We have tried to support
> > Java 9 and so do other projects.
> >
> > I'm a fan to keep two latest supported versions and am willing to work on
> > improving compatibility to help users migrate. But if I make a choice
> > between
> > 4 supported versions and the LTS option, I like the LTS option.
> >
> > Here are several issues I met when trying to support a series of versions:
> >
> > 1. cheery-pick overhead grows, obviously.
> > 2. cheery-pick owner is unclear. I think committers play the role of
> > "reveal
> > all the details", and it makes (1) worse.
> > 3. version policy is harder. Generally a user wants to upgrade from
> > 1.14.3 (latest 1.14.x) to 1.15.2 (latest 1.15.x) without regression.
> > Given 1 & 2 the cherry-pick won't always get merged in time and
> > it's possible that when 1.14.3 released, 1.15.2 is preparing. We have
> > to do some simultaneous releases among 4 versions. And because there
> > are 4 versions, such simultaneous releases will be more required.
> >
> > If a company wants to maintain an early version, it's welcome and can
> > release
> > all in its own - just like those who maintains Java 1.6. It's fine but not
> > under
> > committers/PMC maintenance.
> >
> > Additional input, 1.5, 1.7. 1.8, 1.11, 1.13 are loved.
> >
> > Best,
> > tison.
> >
> >
> > Stephan Ewen <ewenstep...@gmail.com> 于2021年11月13日周六 上午2:35写道:
> >
> > > I am of a bit different opinion here, I don't think LTS is the best way
> > to
> > > go.
> > >
> > > To my understanding, the big issue faced by users is that an upgrade is
> > > simply too hard. Too many things change in subtle ways, you cannot just
> > > take the previous application and configuration and expect it to run the
> > > same after the upgrade. If that was much easier, users would be able to
> > > upgrade more easily and more frequently (maybe not every month, but every
> > > six months or so).
> > >
> > > In contrast, LTS is more about how long one provides patches and
> > releases.
> > > The upgrade problem is the same, between LTS versions, upgrades should
> > > still be smooth. To make LTS to LTS smooth, we need to solve the same
> > issue
> > > as making it smooth from individual version to individual version. Unless
> > > we expect non-linear upgrade paths with migration tools, which I am not
> > > convinced we should do. It seems to be the opposite of where the industry
> > > is moving (upgrade fast and frequently by updating images).
> > >
> > > The big downside of LTS versions is that almost no one ends up using the
> > > other versions, so we get little feedback on features. We will end up
> > > having a feature in Flink for three releases and still barely anyone will
> > > have used it, so we will lack the confidence to turn it on by default.
> > > I also see the risk that the community ends up taking compatibility with
> > > non-LTS releases not as serious, because "it is anyways not an LTS
> > > version".
> > >
> > >
> > > We could look at making the  upgrades smoother, by starting to observe
> > the
> > > issues listed here.
> > > I think we need to do that anyways, because that is what I hear users
> > > bringing up more and more.  If after that we still feel like there is a
> > > problem, then let's revisit this issue.
> > >   - API compatibility (signatures and behavior)
> > >   - Make it possible to pin SQL semantics of a query across releases
> > > (FLIP-190 works on this)
> > >   - Must be possible to use same configs as before in a new Flink version
> > > and keep the same behavior that way
> > >   - REST API must be stable (signature and semantics)
> > >   - Make it possible to mix different client/cluster versions (stable
> > > serializations for JobGraph, etc.)
> > >
> > > The issue that we define officially two supported versions, but many
> > > committers like to backport things for one more version is something we
> > can
> > > certainly look at, to bring some consistency in there.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Fri, Nov 12, 2021 at 9:17 AM Martijn Visser <mart...@ververica.com>
> > > wrote:
> > >
> > > > Thanks for bringing this up for discussion Piotr. One the one hand I
> > > think
> > > > it's a good idea, because of the reasons you've mentioned. On the other
> > > > hand, having an LTS version will remove an incentive for some users to
> > > > upgrade, which will result in fewer Flink users who will test new
> > > features
> > > > because they wait for the next LTS version to upgrade. I can see that
> > > > particularly happening for large enterprises. Another downside I can
> > > > imagine is that upgrading from one LTS version to another LTS version
> > > will
> > > > become more complicated because more changes have been made between
> > those
> > > > versions.
> > > >
> > > > Related to my second remark, would/could introducing an LTS version
> > would
> > > > also trigger a follow-up discussion that we could potentially introduce
> > > > breaking changes in a next LTS version, like a Flink 2.0 [1] ?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-3957
> > > >
> > > > On Fri, 12 Nov 2021 at 08:59, Fabian Paul <fabianp...@ververica.com>
> > > > wrote:
> > > >
> > > > > Thanks for bringing up this topic Piotr.
> > > > > I also think we should try to decouple our release cycles from our
> > > > support
> > > > > plans. Currently we are very limited by the approach because faster
> > > > release
> > > > > cycles result in also faster deprecation of versions.
> > > > >
> > > > >
> > > > > Therefore I am also favoring version 2 where we can align the next
> > LTS
> > > > > version
> > > > > with our development speed. Option 1 I think can easily lead to
> > > confusion
> > > > > when
> > > > > the number of supported releases constantly changes.
> > > > >
> > > > > Best,
> > > > > Fabian
> > > > >
> > > > >
> > > >
> > >
> >

Reply via email to