> On 3 Nov 2021, at 15:03, Thomas Deutschmann <whi...@gentoo.org> wrote:
>
> Hi,
>
> it is currently not possible to smoothly run a world upgrade on a 4 months 
> old system which doesn't even have a complicated package list:
> [snip]
>
> This is not about finding solution to upgrade the system (in this case it was 
> enough to force PYTHON_TARGETS=python3_8 for portage). This is about raising 
> awareness that Gentoo is a rolling distribution and that we guarantee users 
> to be able to upgrade their system when they do world upgrades just once a 
> year (remember: in my case the last world upgrade is just 4 months old!). If 
> they cannot upgrade their system without manual intervention, we failed to do 
> our job.
>
> Situations like this will disqualify Gentoo for any professional environment 
> like this will break automatic upgrades and you cannot roll individual fixes 
> for each possible situation via CFM tools like Salt, Ansible, Puppet or Chef.
>
> It would be very appreciated if everyone will pay more attention to this in 
> future. We can do better. In most cases we can avoid problems like this by 
> keeping older ebuilds around much longer for certain key packages to help 
> with upgrades.


I agree wholeheartedly with this and thank you for raising it.

## Remark on some previous discussion

First, let me just mention that I think it's been on some of our minds but we 
need to go a bit further with formalising matters. It was brought up at the end 
of the September 2021 council meeting as a footnote:
```
[21:16:56] <@sam_> I'd like to consider "upgrade lifcycles" at some point but I 
don't have notes ready for now. Mainly just about formalising efforts to 
support upgrades for X period and to try document a procedure for e.g. new EAPI 
versions and bootstrap packages not having new EAPIs for a while, and such.
[21:17:09] <@sam_> So, no, not right now, but I'd welcome any thoughts 
post-meeting while I consider it more
[21:17:33] <@sam_> The gist is to have a checklist so that we don't "get 
excited" like with EAPI 8 and end up making upgrades hard for people
[21:17:43] <@sam_> I think the GLEP we recently approved helps with that
```

I started working on some notes too on possible improvements: 
https://wiki.gentoo.org/wiki/User:Sam/TODO#Improving_upgrades. (I wanted to 
mention all of this here because
it's easy to lose track of e.g. council meeting references on a topic, so it's 
easy to find it in the thread now.)

## Summary of the two common cases

Now, in terms of the common issues regarding upgrades, I think we have two (to 
be clear, not trying to "fix your problem" -- just bring to bear some of the
support experience I've had from #gentoo and so on):

1) World upgrades which can't complete due to new EAPIs (one's Portage lacks 
support for e.g. EAPI 8 and hence cannot read ebuilds)

I'm open to more broad measures about usage of new EAPIs in ~arch / stable 
(say, e.g. the first Portage supporting EAPI N should sit in
~arch for 4/6/??? months before any ebuilds should use it?), but I think this 
is a drastic measure we might be able to avoid. Let's keep it
in mind in case we do need it though.

My general thinking on this is that it doesn't matter _too much_(?) as long as 
one can upgrade Portage without hassle. A lot of our
users seem to know to try upgrade Portage if they can't upgrade their system 
due to new EAPIs, but they then fall down due to
cryptic errors (see my next point). We could also improve the "unknown EAPI" 
error if necessary to make this more clear.

TL;DR: We might be able to leverage a more drastic option, but my hope is we 
can avoid any direct action in handling 1) if we deal
with the next point I'm about to make (2)).

2) Portage often can't upgrade itself when there's "pending global 
PYTHON_TARGETS changes" (e.g. when we change the default value of
PYTHON_TARGETS in the profiles (like from Python 3.8 to Python 3.9))

This one is far trickier. I've started documenting common hacks/methods at 
https://wiki.gentoo.org/wiki/User:Sam/Portage_help/Upgrading_Portage#Solution
which has been rather useful in #gentoo and on the forums (it's been nice to 
see links on those and other similar pages pop up on /r/gentoo).

Portage is written in Python and has dependencies in Python. A lot of them are 
optional (which is why in the wiki page
I linked to, I suggest emerge --syncing and then turning off USE=rsync-verify 
temporarily to reduce dependencies), but
I don't think this is particularly comforting to a user who just wants to 
upgrade Portage. They don't necessarily realise
they need to toggle one or *several* flags on Portage to make it work.

dilfridge has been advocating for some time that we try look at some form of a 
"static Portage" copy (possibly
vendoring/bundling all Python dependencies) to completely decouple the Portage 
ebuilds from the Python
eclasses other than needing a (modern) Python 3 interpreter.

[I've filed a bug for this here: https://bugs.gentoo.org/821511].

I really feel like this is one of the big things we need to tackle. Upgrading 
Portage unlocks newer
EAPIs and allows us to even discuss world upgrades.

(Using an older Portage to try upgrade world with any non-trivial @world set 
(chosen, user-specified packages)
is likely to be a fool's errand -- folks have already said that if _anything_ 
is using a new EAPI, it's going to affect
some users and result in confusing errors.)

## Solutions

* News item when a new EAPI is released explaining how to upgrade Portage in 
case of emergency / inability
to upgrade Portage.

We can describe the steps at 
https://wiki.gentoo.org/wiki/Project:Portage/Fixing_broken_portage:

This would also flag to users that they should upgrade Portage 
sooner-rather-than-later even if they aren't
currently willing/able to fully upgrade the rest of their system.

* We may want to include a 'rescue-portage' script on the system which 
downloads the latest Portage (would need
to use a symlink or something to reliably get the latest version).

* Investigate reducing Portage's dependencies.

* Mitigate PYTHON_TARGETS profile change impact:
** I don't love this idea but one possible measure is that we always have two 
PYTHON_TARGETS set
at all times (this would double build times for a fair amount of packages).
** Or we do this just for Portage and its dependencies.
** Or we have a new portage-minimal ebuild (to simplify matters) which always 
has some/all targets enabled,
which will have few/no Python dependencies.

[Note that in the past, we weren't consistent about putting out news items for 
this change. We're doing
that now at least.

The matter has got a bit worse because of Python upstream's release cycle 
changing.]

* Implement at least a 4-6 month(?) delay on using new EAPIs after a new 
version of Portage
supports it (the timer resetting once it hits stable too).

I wasn't sure about this at first, but actually, the PYTHON_TARGETS stuff 
_should_ be
fine for the most part as long as we make sure the tree is mostly/entirely 
ready before
flipping the switch.

[This could actually help with a fair amount of the problems (other than 
"general upgrade
issues" like conflicts) except when a new EAPI comes along with a targets 
change,
and if we're looking to support upgrades over a year or two years, that's.. 
probably
going to coincide.]

## TL;DR

I don't think we can avoid thinking about Portage's entanglement / relationship
with PYTHON_TARGETS. Banning use of new EAPIs immediately will not magically
make it easy to upgrade Portage itself.

But the combination of a new EAPI + PYTHON_TARGETS changes in profiles
is pretty lethal.

I've got a few ideas above and I hope we can discuss some of them, or even 
better,
someone has other proposals.

best,
sam

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to