> On 29. Jun 2025, at 02:21, Brian Buhrow <buh...@nfbcal.org> wrote:
> 
> hello.  I have a number of machines running NetBSD-10.99.12/amd64 running on 
> real hardware
> and running as VM machines, mostly xen, but also as guests on KVM.  
> Out of approximately 25 different instances, I have one Xen machine where 
> processes "hang".
> The machine may run for a week before this happens, or it may run for months. 
>  To try and
> figure out what is going wrong, I installed a kernel with ddb in it and when 
> the problem
> manifested itself, I discovered processes and threads that look like:
> 
> 
> PID     LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
> 28144 28144 3   0         0   ffff9dde28395400               sshd fstchg
> 2416   2416 3   0         0   ffff9dde28395000               sshd fstchg
> 1481   1481 3   0         0   ffff9ddcf99dfc00               sshd fstchg
> 6400   6400 3   1         0   ffff9ddcf99df800               cron fstchg
> 28098 28098 3   0         0   ffff9ddcf99df400               cron fstchg
> 5484   5484 3   1         0   ffff9ddcf99df000               cron fstchg
> 1394   1394 3   0         0   ffff9ddd25e8fc00               cron fstchg
> 26447 26447 3   0         0   ffff9ddd25e8f800               cron fstchg
> 
> . . . 
> 
> 0       123 3   0       200   ffff9dddfd0bac00            ioflush fstchg
> 
> The system is:
> 
> NetBSD lothlorien.nfbcal.org 10.99.12 NetBSD 10.99.12 (MIRKWOOD_PVH_DDB) 
> #0: Mon Apr  7 05:50:18 PDT 2025  
> buh...@loth-9.nfbcal.org:/usr/src/sys/arch/amd64/compile/MIRKWOOD_PVH_DDB 
> amd64
> 
> In looking at the code, I see these processes are waiting to do something 
> with the
> filesystem or filesystems.  There are a number of mounted partitions, all 
> ffs, plus a ptyfs
> filesystem running in compatibility mode, i.e. /dev/ttypx, rather than 
> /dev/pts/*, which means
> it doesn't show up as a filesystem at all.  My questions are as follows:
> 
> 1.  How do I find which one of these is the blocking process?
> 
> 
> 2.  Has anyone else seen this behavior?
> 
> As I say, only one of my many machines exhibits this behavior, and it is a 
> Xen guest on
> which other VM's running the exact same code, are working fine for months at 
> a time.
> 
> Suggestions welcome.
> -thanks
> -Brian

These processes are waiting for a file system suspension.  From ddb you may run

        call fstrans_dump(1)

to dump the current state of the suspension subsystem.  You will see which 
processes / lwps
are "inside" a file system and which file systems are suspending / suspended.

The syncer (ioflush) waiting is generally bad, is there still free kmem?

--
J. Hannken-Illjes - hann...@mailbox.org

Reply via email to