On Tue, 12 Aug 2008, John Baldwin wrote:
On Tuesday 12 August 2008 01:23:47 pm Bruce Evans wrote:
On Tue, 12 Aug 2008, John Baldwin wrote:
Of course bpf is
broken with revoke, but nobody uses revoke with bpf. What people do do in
the normal course of using bpf is lots of concurrent bpf accesses, and w/o
D_TRACKCLOSE, bpf devices don't get closed.
Why not fix the actual bug?
Your commit doesn't give enough details on the actual bug, so I will
try to guess it: libpcap has to probe for for a free bpf unit, so it
does lots of failing opens of bpfN (especially when N == 0) when bpfN
is already in use. Failing opens break last closes with which they
are concurrent, because the relevant reference count (si_usecount) is
increased during the failing open (I think it is the vref() in _fgetvp()
that does it)...
Correct-ish. The actual extra reference is in devfs_lookup() rather than
_fgetvp(). Specifically, it is an open concurrent with a close. The opening
thread first does the lookup which bumps the reference count in
devfs_allocv() (IIRC). Eventually we get to devfs_open() which drops the
vnode lock while it invokes d_open(). If the closing thread is waiting for
the vnode lock for close, then it can acquire the lock and do devfs_close().
si_usecount is actually locked by dev_lock() (a global mutex), so in general
the vnode lock is neither necessary nor sufficient for implementing or fixing
the bug, but the general case is rarely used. The general case involves
doing failing opens using more than one devfs mount. dev_lock() is only
held briefly, so failing opens can easily get in and change si_usecount.
count_dev() and vcount() of course use dev_lock(), but this only gives the
transient value, so the value means very little unless it is the magic value
1 (this means that our reference is the only one). devfs_close() has to
unlock everything, so si_usecount becomes even more volatile.
This sees count_dev() > 1 and doesn't call d_close(). Meanwhile, the opening
thread will fail bpfopen() and return, but the bpf device is permamently
open. When I talked with phk@ about it originally his reply to my various
suggestions was D_TRACKCLOSE. I'm not sure how you'd really fix this
otherwise:
I would first try to actually count opens correctly. It should be possible
to tell when a last-close is needed if everything is locked. This is
complicated by the possibility of any numbers of opens and last-closes
already being in progress, with each having progressed to a timing-dependent
and driver-dependent state. This can certainly happen now, but at least the
general case of it shouldn't be allowed to happen since no drivers can handle
it. Ways that it can happen:
- after devfs_close() sees count_dev() == 1, it calls d_close(). SInce it
unlocks everthing, devfs_open() can easily call d_open() any number of
times before the d_close() completes.
- d_close() may be called any number of times to last-close the new open
instances. The new d_close()s may complete before the original ones.
This behaviour is easy to set up for hardware serial devices, and is
useful. Let the first open instance be blocking and make it block on
drainwait in close. Do several non-blocking open/closes using stty
-f to look at the problem. Then do another non-blocking open/close
using stty -f or comcontrol to clear the problem.
calling d_close() from devfs_open() if the use_count is 1 after
re-acquiring the vnode lock (you have to drop the vnode lock while you call
d_close() though, so you might still race with other concurrent opens for
example), having a count of pending opens and subtracting that from
count_dev() when checking for whether or not to call d_close() (but is that
dubious?
This is basically what I put in FreeBSD-1. It almost worked there.
Then you might call d_close() while d_open() is running, so the
driver would need to put locking to synchronize the two and handle the case
that the d_close() actually runs after a d_open() that was successfull, so
each driver now has to duplicate that work.), etc.
Strictly, all drivers already need to do this.
In FreeeBSD-1, simple methods worked provided d_close() and d_open()
don't sleep, since the kernel was not preemptive. After sleeping, it
is obvious that d_close() and d_open() should check for relevant state
changes after waking up and not continue if the changes are too large.
Now, similar behaviour could be obtained using locking. Don't unlock
things in devfs_close() and devfs_open() by default. Only drivers
that unlocked things would face the full complications. In particular,
in the usual case, holding dev_lock() or a corresponding device-dependent
lock would block new opens (and even lookups?) until after d_close()
returns.
Bruce
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"