On Tue, Jul 26, 2011 at 07:12:23PM -0400, Rick Macklem wrote: > Kostik Belousov wrote: > > On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote: > > > Le 26/07/2011 12:06, Kostik Belousov a Иcrit: > > > > On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote: > > > > > Le 25/07/2011 11:59, Kostik Belousov a ?crit: > > > > > > > > > > Ok the patched server crashed this morning strangely : all httpd > > > > > processes were stuck in nfs or vmopar > > > > > and were unkillable. Below is the full ps. > > > > > > > > Please see the > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > > > for information required to debug the deadlocks. > > > > > > the box was not stricly deadlocked since I was able to interact with > > > it but I suppose you want me to > > > break into debugger when the symptoms appears again and report all > > > the commands listed in the handbook > > > deadlock section ? > > > > Exactly. > > > > I think everything was hung that accessed an nfs mount point. > > From the usermode, procstat -kk could catch some interesting > > information, > > but it is redundant if ddb output is captured. > > Would it be worth considering reverting r223054? > (Note that I don't understand the VM side, so this may be completely > wrong:-) > > The sleeps on vmopar could be happening because a dirty page is busy > and r223054 changes the VM_PAGER_xx value set a couple of ways. > 1 - When it returns VM_PAGER_ERROR instead of VM_PAGER_AGAIN, the > return value of "runlen" from vm_pageout_flush() changes. > 2 - I'm not sure, but I think the pre-r223054 code marked a partially > written page as VM_PAGER_OK instead of VM_PAGER_AGAIN? > (I'm wondering about this one, since the problem seems to happen > when the file's size has been truncated.) > > Herve Boulouis, if you want to see what r223054 changes, just go to > http://svn.freebsd.org/viewvc/stable/8/sys/nfsclient > and then click on nfs_bio.c. > (The changes are small and could easily be reverted with a manual > edit.) > > Since r223054 went into stable/8 on Jun 13, it seems a possible > explanation? rick
I doubt it. The ps output makes it not very inplausible that the reporter got the LOR between vnode lock and page busy flag. The correct order is vnode lock -> busy bit. vmopar is a wait for the busy page state. Mentioned revision does not change the lock order. Anyway, this is only a speculation, until the requested data is provided.
pgpN7hsFvpj0G.pgp
Description: PGP signature