Robert Watson wrote: > On Sat, 30 Nov 2002, Michal Mertl wrote: > > I'm now unable to make it dead-lock again. Yet it happened quite easily. > > I had more md backing files in the same directory at the beginning (to > > test Terry's suspicion mentioned in thread 'jail' on hackers@). > > I've noticed that chroot() environments tend to make existing deadlock > opportunities more likely. I'm not quite sure why that is. :-)
Lock to parent. It's the same reason you can lock up if you use automount, with all the automount mount points happening in the same subdirectory. > There are a fair number of vnode locking deadlock scenarios that are > unavoidable where we rely on grabbing vnode locks out of the directory > structure lock order. This occurs for vnode-backed md devices, quotas, > and UFS1 extended attributes, and probably some other situations. I > suspect that Terry is correct that operations on the vnode backing file > storage directory are triggering the problem, since that increases the > chances that a vnode lock "race to root" will occur from both the file > system backed into the md device, and for the md backing vnodes during > blocking I/O. See other postings. The "race to root" is the one I was originally commenting on. I'm not sure that it applies in this case, I think this case might be the "out of memory to create new soft dependencies" case, where you can end up holding a lock on a buffer that needs to be flushed to recover memory, until you can satisfy the request to create a dependency (starvation deadlock). The "race to root" is a "deadly embrace" deadlock. > If you can avoid directory operations on the md backing > directory, that would probably be one way to avoid triggering the bug. Yes. By placing each vnconfiged device in its own subdirectory, you avoid them. There's still a window on your host OS doing it's own traversal, but that's (effectively) a "whole FS lock", so it doesn't trigger a problem. > Seeing it reproduced would probably confirm that this is the case. It's a pain. I wasted a couple of days trying to reproduce, without a box I could wipe and make into a wscratch box, with little luck. I think that it requires reproducing the failing box in detail, which I wasn't willing to do (hence the workaround). > On the > other hand, there may be other deadlocks in the vnode/ufs/md code that can > be more easily corrected than this general VFS problem, so details there > would be very useful. There are a number of them; they are all a pain. It's really tempting to just refactor the code so that all locking occurs at the same logical layer, without being held across function calls. That'd be a heck of a lot of work, though... probably worth it, in the end. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message