Hi Paul, On Thu, Sep 09, 2021 at 11:00:26AM +0200, Salvatore Bonaccorso wrote: > Hi Paul, > > On Thu, Sep 09, 2021 at 09:37:02AM +0200, Paul Gevers wrote: > > Package: src:linux > > Version: 5.10.46-4 > > X-Debbugs-CC: debian...@lists.debian.org, a...@debian.org > > Severity: serious > > Justification: data loss > > > > Hi, > > > > As discussed over IRC, here is the bug report for one of the hanging > > arm64 hosts we have for ci.debian.net. > > > > Since the upgrade of our hosts to bullseye (days before the bullseye > > release) we have been experiencing random loss of access to our hosts. > > For the hosts that have some form of out-of-bound access, I tried to use > > that to see what's going on, but at AWS our account doesn't have the > > right permissions to use the serial port out-of-bound access and all > > other forms that I tried on all hosts that I have access to some for of > > out-of-bound access that didn't work. > > > > Since the bullseye release I've rebooted (externally triggered) already > > dozens of times and for those host that don't allow rebooting (AWS > > again) I had to reprovision the hosts. > > > > All the architectures (amd64, arm64, ppc64el and s390x) that we have > > experience these hangs. I'm absolutely not claiming that the root cause > > is the same, but on buster we didn't experience this (our s390x host > > never workerd on buster so I don't claim regression there), so there is > > a pattern. However, the symptoms don't look completely the same everywhere. > > > > On one of our arm64 hosts (we call ci-worker-armel-01) I found the > > attached logging as the final logs in the journal. > > I suspect it's the same issue as fixed by > https://git.kernel.org/linus/ad9f151e560b016b6ad3280b48e42fa11e1a5440 > upstream, > https://lore.kernel.org/lkml/000000000000ef07b205c3cb1...@google.com/ > > The fix landed in 5.13-rc7 (was backported to 5.12.13 as well, but not > 5.10.y). It seems it requires more work to address it as well in > 5.10.y. > > Asked upstream in > https://lore.kernel.org/lkml/ytkj4xh2ol075...@eldamar.lan/
The needed patches are now there: https://lore.kernel.org/stable/20210909140337.29707-1...@strlen.de/ and queued for the next 5.10.y upload (so I expect it to have thos latest in our first bullseye point release). I will try to cherry-pick those, if you can check they fix the issue that would be great. Regards, Salvatore