> On 29. Jun 2025, at 02:21, Brian Buhrow <buh...@nfbcal.org> wrote: > > hello. I have a number of machines running NetBSD-10.99.12/amd64 running on > real hardware > and running as VM machines, mostly xen, but also as guests on KVM. > Out of approximately 25 different instances, I have one Xen machine where > processes "hang". > The machine may run for a week before this happens, or it may run for months. > To try and > figure out what is going wrong, I installed a kernel with ddb in it and when > the problem > manifested itself, I discovered processes and threads that look like: > > > PID LID S CPU FLAGS STRUCT LWP * NAME WAIT > 28144 28144 3 0 0 ffff9dde28395400 sshd fstchg > 2416 2416 3 0 0 ffff9dde28395000 sshd fstchg > 1481 1481 3 0 0 ffff9ddcf99dfc00 sshd fstchg > 6400 6400 3 1 0 ffff9ddcf99df800 cron fstchg > 28098 28098 3 0 0 ffff9ddcf99df400 cron fstchg > 5484 5484 3 1 0 ffff9ddcf99df000 cron fstchg > 1394 1394 3 0 0 ffff9ddd25e8fc00 cron fstchg > 26447 26447 3 0 0 ffff9ddd25e8f800 cron fstchg > > . . . > > 0 123 3 0 200 ffff9dddfd0bac00 ioflush fstchg > > The system is: > > NetBSD lothlorien.nfbcal.org 10.99.12 NetBSD 10.99.12 (MIRKWOOD_PVH_DDB) > #0: Mon Apr 7 05:50:18 PDT 2025 > buh...@loth-9.nfbcal.org:/usr/src/sys/arch/amd64/compile/MIRKWOOD_PVH_DDB > amd64 > > In looking at the code, I see these processes are waiting to do something > with the > filesystem or filesystems. There are a number of mounted partitions, all > ffs, plus a ptyfs > filesystem running in compatibility mode, i.e. /dev/ttypx, rather than > /dev/pts/*, which means > it doesn't show up as a filesystem at all. My questions are as follows: > > 1. How do I find which one of these is the blocking process? > > > 2. Has anyone else seen this behavior? > > As I say, only one of my many machines exhibits this behavior, and it is a > Xen guest on > which other VM's running the exact same code, are working fine for months at > a time. > > Suggestions welcome. > -thanks > -Brian
These processes are waiting for a file system suspension. From ddb you may run call fstrans_dump(1) to dump the current state of the suspension subsystem. You will see which processes / lwps are "inside" a file system and which file systems are suspending / suspended. The syncer (ioflush) waiting is generally bad, is there still free kmem? -- J. Hannken-Illjes - hann...@mailbox.org