On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote: > On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper <[email protected]> wrote: > > On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick <[email protected]> > > wrote: > >>> Date: Tue, 3 May 2011 22:40:26 -0700 > >>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS > >>> partition when filesystem full > >>> From: Garrett Cooper <[email protected]> > >>> To: Jeff Roberson <[email protected]>, > >>> Marshall Kirk McKusick <[email protected]> > >>> Cc: FreeBSD Current <[email protected]> > >>> > >>> Hi Jeff and Dr. McKusick, > >>> Ran into this panic when /usr ran out of space doing a make > >>> universe on amd64/r221219 (it took ~15 minutes for the panic to occur > >>> after the filesystem ran out of space -- wasn't quite sure what it was > >>> doing at the time): > >>> > >>> ... > >>> > >>> Let me know what other commands you would like for me to run in kgdb. > >>> Thanks, > >>> -Garrett > >> > >> You did not indicate whether you are running an 8.X system or a 9-current > >> system. It would be helpful to know that. > > > > I've actually been running CURRENT for a few years now, but you're right -- > > I didn't mention that part. > > > >> Jeff thinks that there may be a potential race in the locking code for > >> softdep_request_cleanup. If so, this patch for 9-current should fix it: > >> > >> Index: ffs_softdep.c > >> =================================================================== > >> --- ffs_softdep.c (revision 221385) > >> +++ ffs_softdep.c (working copy) > >> @@ -11380,7 +11380,8 @@ > >> continue; > >> } > >> MNT_IUNLOCK(mp); > >> - if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, > >> curthread)) { > >> + if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | > >> LK_INTERLOCK, > >> + curthread)) { > >> MNT_ILOCK(mp); > >> continue; > >> } > >> > >> If you are running an 8.X system, hopefully you will be able to apply it. > > > > I've applied it, rebuilt and installed the kernel, and trying to > > repro the case again. Will let you know how things go! > > Happened again with the change. It's really easy to repro: > > 1. Get a filesystem with UFS+SU > 2. Execute something that does a large number of small writes to a partition. > 3. 'dd if=/dev/zero of=FOO bs=10m' on the same partition > > The kernel will panic with the issue I discussed above. > Thanks!
Jeff' change is required to avoid LORs, but it is not sufficient to
prevent recursion. We must skip the vnode supplied as a parameter to
softdep_request_cleanup(). Theoretically, other vnodes might be also
locked by curthread, thus I think the change below is needed. Try this.
diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index a6d4441..25fa5d6 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -11380,7 +11380,9 @@ retry:
continue;
}
MNT_IUNLOCK(mp);
- if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
+ if (VOP_ISLOCKED(lvp) ||
+ vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
+ curthread)) {
MNT_ILOCK(mp);
continue;
}
pgpnPeiKnHi9d.pgp
Description: PGP signature
