Hi Vincent!

On Wed, Mar 31, 2021 at 12:11:32PM +0200, Vincent Bernat wrote:
> It's a bit annoying that fixes reach a LTS version before the non-LTS
> one. The upgrade scenario is one annoyance, but if there is a
> regression, you also impact far more users.

I know, this is also why I'm quite a bit irritated by this.

> You could tag releases in
> git (with -preX if needed) when preparing the releases and then issue
> the release with a few days apart.

In practice the tag serves no purpose, but that leads to the same
principle as leaving some fixes pending in the -next branch.

> Users of older versions will have
> less frequent releases in case regressions are spotted, but I think
> that's the general expectation: if you are running older releases it's
> because you don't have time to upgrade and it's good enough for you.

I definitely agree with this and that's also how I'm using LTS versions
of various software and why we try to put more care on LTS versions here.

> For example:
>  - 2.3, monthly release or when there is a big regression
>  - 2.2, 3 days after 2.3
>  - 2.0, 3 days after 2.2, skip one out of two releases
>  - 1.8, 3 days after 2.0, skip one out of four releases
> 
> So, you have a 2.3.9. At the same time, you tag 2.2.12-pre1 (to be
> released in 3 working days if everything is fine) and you skip skip 2.0
> and 1.8 this time because they were releases to match 2.3.8. Next time,
> you'll have a 2.0.22-pre1 but no 1.8.30-pre1 yet.

This will not work. I tried this when I was maintaining kernels, and the
reality is that users who stumble on a bug want their fix. And worse,
their stability expectations when running on older releases make them
even more impatient, because 1) older releases *are* expected to be
reliable, 2) they're deployed on sensitive machines, where the business
is, and 3) it's expected there are very few pending fixes so for them
there's no justification for delaying the fix they're waiting for.

> If for some reason, there is an important regression in 2.3.9 you want
> to address, you release a 2.3.10 and a 2.2.12-pre2, still no 2.0.22-pre1
> nor 1.8.30-pre1. Hopefully, no more regressions spotted, you tag 2.2.12
> on top of 2.2.12-pre2 and issue a release.

The thing is, the -pre releases will just be tags of no use at all.
Maintenance branches collect fixes all the time and either you're on a
release or you're following -git. And quite frankly, most stable users
are on a point release because by definition that's what they need. What
I'd like to do is to maintain a small delay between versions, but there
is no need to maintain particularly long delays past the next LTS.

What needs to be particularly protected are the LTS as a whole. There
are more affected users by 2.2 breakage than 2.0 breakage, and the risk
is the same for each of them. So instead we should make sure that all
versions starting from the first LTS past the latest branch will be
slightly delayed. But there's no need to further enforce a delay between
them.

What this means is that when issuing a 2.3 release, we can wait a bit
before issuing the 2.2, and then once 2.2 is emitted, most of the
potential damage is already done, so there's no reason for keeping older
ones on hold as it can only force their users to live with known bugs.

And when the latest branch is an LTS (like in a few months once 2.4 is
out), we'd emit 2.4 and 2.3 together, then wait a bit and emit 2.2 and
the other ones. This maintains the principle that the LTS before the
latest branch should be very stable.

With this said, remains the problem of late fixes that I mentioned and
that are discovered during this grace period. The tricky ones can wait
in the -next branch, but the other ones should be integrated, otherwise
the nasty effect is that users think "let's not upgrade to this one but
wait for the next one so that I do not have to schedule another update
later and that I collect all fixes at once". But if we integrate
sensitive fixes in 2.2 that were not yet in a released 2.3, those
upgrading will face some breakage.

On the kernel Greg solved all this by issuing all versions very
frequently: as long as you produce updates faster than users are
willing to deploy them, they can choose what to do. It just requires
a bandwidth that we don't have :-/ Some weeks several of us work full
time on backports and tests! Right now we've reached a point where
backports can prevent us from working on mainline, and where this lack
of time increases the risk of regressions, and the regressions require
more backport time.

I think that the real problem arrives when a version becomes generally
available in distros. And distro users are often the ones with the least
autonomy when it comes to rolling back. When you build from sources,
you're more at ease. Thus probably that a nice solution would be to
add an idle period between a stable release and its appearance in
distros so that it really gets some initial deployment before becoming
generally available. And I know that some users complain when they do
not immediately see their binary package, but that's something we can
easily explain and document. We could even indicate a level of confidence
in the announce messages. It has the merit of respecting the principle
of least surprise for everyone in the chain, including those like you
and me involved in the release cycle and who did not necessarily plan
to stop all activities to work on yet-another-release because the
long-awaited fix-of-the-month broke something and its own fix broke
something else.

With this said, given that 2.2 is still having problems with the
freq_ctr issue, I don't want to postpone it too much, or more and more
users will deploy it and get trapped with constantly increasing counters
for the next 22 days.

Willy

Reply via email to