Hi, On Tuesday, 21 June 2022 22:31:45 CEST Paul Gevers wrote: > On 21-06-2022 22:07, Diederik de Haas wrote: > > > Do these errors still occur? Still with 5.10.103-1 or a later one? > > The last occurrence of a machine hang I had is from 5 May 2022, but I'm > not sure if I checked if it was this same issue. Normally our kernels > are up-to-date, but I don't recall what we had at the time. We have > recommissioned our arm64 hosts, so the install logs are lost by now.
It's good for ci.debian.net that there are such large gaps between failures, but it makes debugging a bit harder. I think that the install logs aren't that important (anymore) as the issue/ symptoms appear to be the same: - some swap action resulting in some failure - CPU gets stuck - watchdog triggers a reboot How is swap configured on these devices? > > Is it only on arm64 machines? Or is this just an example which also > > occurs on other arches? > > I'm pretty sure I haven't seen this on other arches, otherwise I'm sure > I would have reported it to this bug. Yeah, I _assumed_ as such, but assumptions can be dangerous ;-) Normally I scroll (hard) by the hardware listings as that rarely says anything to me. And I did that before too, but just now I made an important discovery. I *assumed* it was running on arm64 (native) hardware and was about to ask specifics about it and then I noticed this: Host bridge [0600]: Red Hat, Inc. QEMU PCIe Host bridge [1b36:0008] Qemu. Quite likely unrelated, but a while back I had an issue with qemu in building arm64 images: https://bugs.debian.org/988174 I think it would be useful to know which qemu version(s) were used. (It's unlikely I'll be able to help find the cause/solution, mostly gathering hopefully useful bits of information for people who could) > > If it still occurs, then the likely only way to get a possible resolve is > > reporting it to upstream. > > 1.5 months is quite long for it to be gone, although, before that it was > 2.5 months. If the issue does occur again, I think it would be useful to bring 'upstream' into the conversation. They likely can bring much more useful input into this then (f.e.) I could. Also, if upstream is made aware there is an issue (even infrequent), then they can make the most informed choice what to do with it. Cheers, Diederik
signature.asc
Description: This is a digitally signed message part.