Re: file system deadlock - the whole story?

Kostik Belousov Wed, 19 Jul 2006 04:25:22 -0700

On Tue, Jul 18, 2006 at 07:51:52AM -0300, User Freebsd wrote:
> 
> 'k, had a bunch of fun tonight, but one of the results is that I was able 
> to achieve file system deadlock, or so it appears ...
> 
> Using the following from DDB:
> 
> set $lines=0
> show pcpu
> show allpcpu
> ps
> trace
> alltrace
> show locks
> show alllocks
> show uma
> show malloc
> show lockedvnods
> call doadump
> 
> I've been able to produce the attached output, as well as have a core dump 
> that can hopefully be used to gather any that I may have missed this time 
> *cross fingers*


Marc,
I seriously doubt that the problems machine experiencing is deadlock.

At the http://people.freebsd.org/~kib/e1.gif is the graph of the locking
dependencies for the vnode locks. The edge from process a to process b means
that process a holds a lock and process b is waiting for the lock. Black
edge means dependency by the vnode lock, red edge - by the buffer lock.

As you see, graph is acyclic. Basically, there are two groups of the
processes that a blocked: one hierarchy rooted in the pid 66575, this one
includes shell 806. Second one is rooted in the process 32.

What are they doing ? Pid 66575:

Tracing command smtpd pid 66575 tid 101396 td 0xceb0a180
sched_switch(ceb0a180,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(dc5b5b20,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1
sleepq_wait(dc5b5b20,0,c0601d10,e59,8) at sleepq_wait+0x46
msleep(dc5b5b20,c06afde0,44,c061021d,0) at msleep+0x279
bwait(dc5b5b20,44,c061021d) at bwait+0x47
vnode_pager_generic_getpages(c8e85000,ed347c80,1000,0,c8e22000) at 
vnode_pager_generic_getpages+0x777
ffs_getpages(ed347bbc,c8e85000,0,ed347be8,c0597c41) at ffs_getpages+0x100
VOP_GETPAGES_APV(c063c100,ed347bbc) at VOP_GETPAGES_APV+0xa9
vnode_pager_getpages(c8e22000,ed347c80,1,0) at vnode_pager_getpages+0xa5
vm_fault(c88da4a0,280bb000,1,0,ceb0a180) at vm_fault+0x980
trap_pfault(ed347d38,1,280bb000,280bb000,0) at trap_pfault+0xce
trap(3b,3b,3b,8078d1c,807952c) at trap+0x1eb
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0x280baffd, esp = 0xbfbfe894, ebp = 0xbfbfe8d8 ---

This process waits for the data to be paged in.

Pid 32 (syncer)

Tracing command syncer pid 32 tid 100033 td 0xc8544780
sched_switch(c8544780,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(dc79fe68,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1
sleepq_wait(dc79fe68,0,c0601d10,e59,c06039a0) at sleepq_wait+0x46
msleep(dc79fe68,c06afde0,4c,c06024dc,0) at msleep+0x279
bwait(dc79fe68,4c,c06024dc) at bwait+0x47
bufwait(dc79fe68,1,0,0,0) at bufwait+0x1a
breadn(c8a0b414,6537700,0,4000,0) at breadn+0x266
bread(c8a0b414,6537700,0,4000,0) at bread+0x20
ffs_update(c9992000,0,6,0,0) at ffs_update+0x228
ffs_syncvnode(c9992000,3) at ffs_syncvnode+0x3be
ffs_sync(c8831400,3,c8544780,c8831400,2) at ffs_sync+0x209
sync_fsync(e817fcbc,c8a11ae0,c8a11bec,e817fcd8,c04ed586) at sync_fsync+0x126
VOP_FSYNC_APV(c0634220,e817fcbc) at VOP_FSYNC_APV+0x9b
sync_vnode(c8a11bec,c8544780) at sync_vnode+0x106
sched_sync(0,e817fd38,0,c04ed614,0) at sched_sync+0x1ed
fork_exit(c04ed614,0,e817fd38) at fork_exit+0xa0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe817fd6c, ebp = 0 ---

also waits for the data.

What happens with blocks ?
syncer (pid 32) locked block 0xc8a0b414 and waits for data (as shown before).
Processes 33 (softdepflush), umount (pid 73338) waits for this block.

You did not provided the output of "show lockedbufs",
but, even without that data, I doubt that the buf subsystem deadlocked by
itself.

I make an conjecture that the problem is either with you disk hardware (i.e.,
actual hard drive or disk controller), or in the controller driver.

At least, you could show us the dmesg.

pgpBGmm7iDLMA.pgp
Description: PGP signature

Re: file system deadlock - the whole story?

Reply via email to