Dave, Mike, Thanks to reply.
Here inlined replies to both emails. On Mon, 04 Nov 2024 19:58:10 +0100, Dave Voutila <d...@sisu.io> wrote: > > Mike Larkin <mlar...@nested.page> writes: > > > On Sun, Nov 03, 2024 at 02:03:15PM +0100, Kirill A. Korinsky wrote: > >> On Sun, 03 Nov 2024 13:28:16 +0100, > >> Mike Larkin <mlar...@nested.page> wrote: > >> > >> > I'm guessing this isn't a vmd/vmm thing, as those components don't > >> > interact > >> > with acpi. We have seen stuck acpi threads on other machines after > >> > un-zzz/un-ZZZ > >> > in some cases. Were you doing suspends or hibernates? > >> > >> Well, when the system degraded that time I, in addition to extremely slow > >> IO, had seen srdis consuming a lot of resources and quit toxic leads to > >> stop > >> consuming it. So, I am not completly sure that this is only acpi related > >> things. > >> > > > > That's softraid and vmd is not likely causing this unless you're writing > > tons > > of data suddenly and even then it shouldn't even register as anything crazy > > high. > > I run one docker image which "updates" some files in local FS which is exported via NFS to Linux machine and inside it into docker. Anyway, volume of data which is read or written is not that large. Something between 10 and 20 Mb. But the application is written on Java and it may consume some CPU time for GC. > >> I do suspend often but during my expirement with snapshot's kernel I keep > >> system on AC power to avoid any suspend/hibernate and it had hit it. > >> > > The combination of AC power (so your CPUs will be running at max > frequency) and Docker on Linux...how hot does this machine get out of > curiosity? > Usually I run custom kernel with a few minor patches and one of them removes that block from setperf_auto: if (hw_power) { speedup = 1; goto faster; } and tests on snapshot kernel was made by me to elliminate possibility of any effect from that changes. > >> If you can suggest anything that I can do to collect some usefull data when > >> I hit it next time, I'll be appreciated and do it. > >> > > Try to check the thermal sensors output. I'm wondering if this is a > hardware issue and you're pushing the temperature of the NVMe controller > past its comfort point and shit is going sideways. I think some can > throttle themselves in an effort to prevent turning into glass. > Noted, thanks! -- wbr, Kirill