> I neglected to mention that I run XFCE with its System Load Monitor panel > plugin having a RAM indicator bar that climbs and maxes out right before the > hang. RAM is something I often need to monitor because I run scientific > computing programs which can also exhaust my RAM if I'm not careful.
OK, the climb you describe does justify your suspicion that it's related to RAM usage. It's not a proof, but it's a strong indication. > The hanging behavior is like a step function: the computer goes from being > fully responsive to completely unresponsive; That's very much *unlike* a normal "out of RAM" situation, OTOH. Normally what happens is that the OS starts to shuffle things around (throwing out cached data, moving other to swap, etc...) making the machine slower and slower. The step function sounds much more like a bug such as a deadlock. > I don't don't see the swap bar in the System Load Monitor increase, > which is strange. I have experienced slow swap behavior before, but > usually there I have intermittent control to Ctrl+Q programs and > recover out of the slowness at time scales on the order of a few tens > of seconds. That's what happens for "normal out of RAM" situations, indeed. > Your thought about swap inspired me to check whether my swap partition is > functional. I don't know how to empirically test that swap works and only > know to reading /etc/fstab where I do see that I have a swap mount point > present. `apropos swap` led me to check systemd's swap.target which I also > see is active, and also see the corresponding swap volume in systemctl. > This resembles my laptop setup where I know the swap partition works there, > from the rare occasions where I seeing its swap bar move in the XFCE System > Load Monitor panel plugin. You could run a `memtester` process and tell it to test, say 6GB, so you the kernel only has 2GB left to play with and it will be forced to push stuff to swap, which you should then see in the output of `free`. But I suspect in your case the details of *how* you get into the "out of RAM" are relevant. > This is a great insight. I've hit my share of graphics card bugs over the > years. A few years ago I was bitten by an Intel IOMMU related graphical bug > on my laptop which I worked around with a kernel parameter tweak. > In earlier years, similar story with nvidia and nouveau drivers. Indeed, graphics card bugs often display the step function, because if the rest of the system may keep working (at least for a while), you can't really "see" it (unless you manage to connect into the machine via the network). > What's also unusual about this desktop is it's my first attempt using ZFS on I have no experience or even much knowledge about ZFS, sorry. You might want to try and set that same machine up with an ext4 filesystem instead temporarily to see if you can reproduce the problem even without the use of ZFS (depending on how ZFS is used and your disk setup, it might be possible to do it easily, without having to reinstall (which could result in a sufficiently different system that it'd then be hard to convince oneself that the only difference is ZFS-vs-ext4)). Stefan