Robert Watson wrote:
> On Sat, 30 Nov 2002, Michal Mertl wrote:
> > I'm now unable to make it dead-lock again. Yet it happened quite easily.
> > I had more md backing files in the same directory at the beginning (to
> > test Terry's suspicion mentioned in thread 'jail' on hackers@).
> 
> I've noticed that chroot() environments tend to make existing deadlock
> opportunities more likely.  I'm not quite sure why that is.  :-)

Lock to parent.  It's the same reason you can lock up if you
use automount, with all the automount mount points happening
in the same subdirectory.

> There are a fair number of vnode locking deadlock scenarios that are
> unavoidable where we rely on grabbing vnode locks out of the directory
> structure lock order.  This occurs for vnode-backed md devices, quotas,
> and UFS1 extended attributes, and probably some other situations.  I
> suspect that Terry is correct that operations on the vnode backing file
> storage directory are triggering the problem, since that increases the
> chances that a vnode lock "race to root" will occur from both the file
> system backed into the md device, and for the md backing vnodes during
> blocking I/O.

See other postings.  The "race to root" is the one I was
originally commenting on.  I'm not sure that it applies in
this case, I think this case might be the "out of memory to
create new soft dependencies" case, where you can end up
holding a lock on a buffer that needs to be flushed to recover
memory, until you can satisfy the request to create a dependency
(starvation deadlock).  The "race to root" is a "deadly embrace"
deadlock.


> If you can avoid directory operations on the md backing
> directory, that would probably be one way to avoid triggering the bug.

Yes.  By placing each vnconfiged device in its own subdirectory,
you avoid them.  There's still a window on your host OS doing
it's own traversal, but that's (effectively) a "whole FS lock",
so it doesn't trigger a problem.

> Seeing it reproduced would probably confirm that this is the case.

It's a pain.  I wasted a couple of days trying to reproduce,
without a box I could wipe and make into a wscratch box, with
little luck.  I think that it requires reproducing the failing
box in detail, which I wasn't willing to do (hence the workaround).


> On the
> other hand, there may be other deadlocks in the vnode/ufs/md code that can
> be more easily corrected than this general VFS problem, so details there
> would be very useful.

There are a number of them; they are all a pain.  It's really
tempting to just refactor the code so that all locking occurs
at the same logical layer, without being held across function
calls.  That'd be a heck of a lot of work, though... probably
worth it, in the end.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to