On Thu, Oct 26, 2017 at 11:11 AM, Thorsten Leemhuis <regressi...@leemhuis.info> wrote: > > All that afaics doesn't matter. If a new kernel breaks things for people > (that especially includes people that do *not* update their userland) > then it's a kernel regression, even if the root of the problem is in > usersland. Linus (CCed) said that often enough (I really should sit down > and collect his mails on this from the web and put them in one > document).
Thorsten is very much correct. People should basically always feel like they can update their kernel and simply not have to worry about it. I refuse to introduce "you can only update the kernel if you also update that other program" kind of limitations. If the kernel used to work for you, the rule is that it continues to work for you. There have been exceptions, but they are few and far between, and they generally have some major and fundamental reasons for having happened, that were basically entirely unavoidable, and people _tried_hard_ to avoid them. Maybe we can't practically support the hardware any more after it is decades old and nobody uses it with modern kernels any more. Maybe there's a serious security issue with how we did things, and people actually depended on that fundamentally broken model. Maybe there was some fundamental other breakage that just _had_ to have a flag day for very core and fundamental reasons. And notice that this is very much about *breaking* peoples environments. Behavioral changes happen, and maybe we don't even support some feature any more. There's a number of fields in /proc/<pid>/stat that are printed out as zeroes, simply because they don't even *exist* in the kernel any more, or because showing them was a mistake (typically an information leak). But the numbers got replaced by zeroes, so that the code that used to parse the fields still works. The user might not see everything they used to see, and so behavior is clearly different, but things still _work_, even if they might no longer show sensitive (or no longer relevant) information. But if something actually breaks, then the change must get fixed or reverted. And it gets fixed in the *kernel*. Not by saying "well, fix your user space then". It was a kernel change that exposed the problem, it needs to be the kernel that corrects for it, because we have a "upgrade in place" model. We don't have a "upgrade with new user space". And I seriously will refuse to take code from people who do not understand and honor this very simple rule. This rule is also not going to change. And yes, I realize that the kernel is "special" in this respect. I'm proud of it. I have seen, and can point to, lots of projects that go "We need to break that use case in order to make progress" or "you relied on undocumented behavior, it sucks to be you" or "there's a better way to do what you want to do, and you have to change to that new better way", and I simply don't think that's acceptable outside of very early alpha releases that have experimental users that know what they signed up for. The kernel hasn't been in that situation for the last two decades. We do API breakage _inside_ the kernel all the time. We will fix internal problems by saying "you now need to do XYZ", but then it's about internal kernel API's, and the people who do that then also obviously have to fix up all the in-kernel users of that API. Nobody can say "I now broke the API you used, and now _you_ need to fix it up". Whoever broke something gets to fix it too. And we simply do not break user space. Linus