hello.  I have a number of machines running NetBSD-10.99.12/amd64 
running on real hardware
and running as VM machines, mostly xen, but also as guests on KVM.  
Out of approximately 25 different instances, I have one Xen machine where 
processes "hang".
The machine may run for a week before this happens, or it may run for months.  
To try and
figure out what is going wrong, I installed a kernel with ddb in it and when 
the problem
manifested itself, I discovered processes and threads that look like:


PID     LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
28144 28144 3   0         0   ffff9dde28395400               sshd fstchg
2416   2416 3   0         0   ffff9dde28395000               sshd fstchg
1481   1481 3   0         0   ffff9ddcf99dfc00               sshd fstchg
6400   6400 3   1         0   ffff9ddcf99df800               cron fstchg
28098 28098 3   0         0   ffff9ddcf99df400               cron fstchg
5484   5484 3   1         0   ffff9ddcf99df000               cron fstchg
1394   1394 3   0         0   ffff9ddd25e8fc00               cron fstchg
26447 26447 3   0         0   ffff9ddd25e8f800               cron fstchg

. . . 

0       123 3   0       200   ffff9dddfd0bac00            ioflush fstchg

The system is:

NetBSD lothlorien.nfbcal.org 10.99.12 NetBSD 10.99.12 (MIRKWOOD_PVH_DDB) 
#0: Mon Apr  7 05:50:18 PDT 2025  
buh...@loth-9.nfbcal.org:/usr/src/sys/arch/amd64/compile/MIRKWOOD_PVH_DDB amd64

        In looking at the code, I see these processes are waiting to do 
something with the
filesystem or filesystems.  There are a number of mounted partitions, all ffs, 
plus a ptyfs
filesystem running in compatibility mode, i.e. /dev/ttypx, rather than 
/dev/pts/*, which means
it doesn't show up as a filesystem at all.  My questions are as follows:

1.  How do I find which one of these is the blocking process?


2.  Has anyone else seen this behavior?

        As I say, only one of my many machines exhibits this behavior, and it 
is a Xen guest on
which other VM's running the exact same code, are working fine for months at a 
time.

Suggestions welcome.
-thanks
-Brian

Reply via email to