Mike Larkin <mlar...@nested.page> writes: > On Sun, Nov 03, 2024 at 02:03:15PM +0100, Kirill A. Korinsky wrote: >> On Sun, 03 Nov 2024 13:28:16 +0100, >> Mike Larkin <mlar...@nested.page> wrote: >> > >> > This is exactly what many of us do, every day. So I'm not sure what's >> > triggering your scenario. Any way to narrow it down more than "just use the >> > system for a day or two"? Eg, "here's a script you can run inside an >> > alpine VM >> > that triggers the issue"? >> > >> >> Frankly, I have no idea how to narrow it down future. I use this VM to run >> docker-compose and it works fine, until the system is degradated. >> >> > I'm guessing this isn't a vmd/vmm thing, as those components don't interact >> > with acpi. We have seen stuck acpi threads on other machines after >> > un-zzz/un-ZZZ >> > in some cases. Were you doing suspends or hibernates? >> >> Well, when the system degraded that time I, in addition to extremely slow >> IO, had seen srdis consuming a lot of resources and quit toxic leads to stop >> consuming it. So, I am not completly sure that this is only acpi related >> things. >> > > That's softraid and vmd is not likely causing this unless you're writing tons > of data suddenly and even then it shouldn't even register as anything crazy > high. > >> I do suspend often but during my expirement with snapshot's kernel I keep >> system on AC power to avoid any suspend/hibernate and it had hit it. >>
The combination of AC power (so your CPUs will be running at max frequency) and Docker on Linux...how hot does this machine get out of curiosity? >> If you can suggest anything that I can do to collect some usefull data when >> I hit it next time, I'll be appreciated and do it. >> Try to check the thermal sensors output. I'm wondering if this is a hardware issue and you're pushing the temperature of the NVMe controller past its comfort point and shit is going sideways. I think some can throttle themselves in an effort to prevent turning into glass. > > dunno, your system seems to be behaving weird. I don't think vmd/vmm is > causing > excessive acpi interrupts or softraid overhead though. Those things are pretty > much unrelated. > > The only thing I can suggest is try and see if some specific action causes the > problem and try to narrow it down. > >> -- >> wbr, Kirill