J David wrote:
> On Thu, Aug 15, 2013 at 5:39 PM, Rick Macklem <rmack...@uoguelph.ca>
> wrote:
> > Have you been able to pass the debugging info on to Kostik?
> >
> > It would be really nice to get this fixed for FreeBSD9.2.
> 
> You're probably not talking to me, but headway here is slow.  At our
> location, we have been continuing to test releng/9.2 extensively, but
> with r250907 reverted.  Since reverting it solves the issue, and
> since
> there haven't been any further changes to releng/9.2 that might also
> resolve this issue, re-applying r250907 is perceived here as
> un-fixing
> a problem.  Enthusiasm for doing so is correspondingly low, even if
> the purpose is to gather debugging info. :(
> 
> However, after finally having clearance to test releng/9.2 r254540
> with r250907 included and with DDB on five nodes.  The problem
> cropped
> up in about an hour.  Two threads in one process deadlocked, was
> perfect.  Got it into DDB and saw the stack trace was scrolling off
> so
> there was no way to copy it by hand.  Also, the machine's disk is
> smaller than physical RAM, so no dump file. :(
> 
> Here's what is available so far:
> 
> db> show proc 33362
> 
> Process 33362 (httpd) at 0xcd225b50:
> 
>  state: NORMAL
> 
>  uid: 25000 gids: 25000
> 
>  parent: pid 25104 at 0xc95f92d4
> 
>  ABI: FreeBSD ELF32
> 
>  arguments: /usr/local/libexec/httpd
> 
>  threads: 3
> 
> 100405 D newnfs 0xc9b875e4 httpd
> 
Ok, so this one is waiting for an NFS vnode lock.

> 100393 D pgrbwt 0xc43a30c0 httpd
> 
This one is sleeping in vm_page_grab() { which I suspect has
been called from kern_sendfile() with a shared vnode lock held,
from what I saw on the previous debug info }.

> 100755 S uwait 0xc84b7c80 httpd
> 
> 
> Not much to go on. :(  Maybe these five can be configured with serial
> consoles.
> 
> So, inquiries are continuing, but the answer to "does this still
> happen on 9.2-RC2?" is definitely yes.
> 
Since r250027 moves a vn_lock() to before the vm_page_grab() call in
kern_sendfile(), I suspect that is the cause of the deadlock. (r250027
is one of the 3 commits MFC'd by r250907)

I don't know if it would be safe to VOP_UNLOCK() the vnode after VOP_GETATTR()
and then put the vn_lock() call that comes after vm_page_grab() back in or 
whether
r250027 should be reverted (getting rid of the VOP_GETATTR() and going back to
using the size in the vm stuff).

Hopefully Kostik will know what is best to do with it now, rick

> Thanks!
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"
> 
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to