On Wed, 13 Aug 2008, Bruce Evans wrote:

On Tue, 12 Aug 2008, John Baldwin wrote:

Of course bpf is
broken with revoke, but nobody uses revoke with bpf.  What people do do in
the normal course of using bpf is lots of concurrent bpf accesses, and w/o
D_TRACKCLOSE, bpf devices don't get closed.

Why not fix the actual bug?

Your commit doesn't give enough details on the actual bug, so I will
try to guess it:  libpcap has to probe for for a free bpf unit, so it
does lots of failing opens of bpfN (especially when N == 0) when bpfN
is already in use.  Failing opens break last closes with which they
are concurrent, because the relevant reference count (si_usecount) is
increased during the failing open (I think it is the vref() in _fgetvp()
that does it).  Then when the opens fail, si_usecount is decremented
to 1, but devfs_close() is not called again because only 1 real last
close is possible (I think -- at least without races with revoke()),
so d_close() is never called twice for 1 real least close.  Failing
opens shouldn't take long, so it is surprising that the race is often
lost.  Apparently there is some synchronization.

This bug probably also affects the probe for a free pty.

In fact, it is well known that it does.  There is a PR or 2 about this,
and committed fixes that were backed out because they gave panics instead
of just leaked ptys.  The following seems to my most recent mail about
this.  The test program in it still leaks ptys under -current.

% From [EMAIL PROTECTED] Thu Nov  9 16:39:06 2006 +1100
% Date: Thu, 9 Nov 2006 16:39:03 +1100 (EST)
% From: Bruce Evans <[EMAIL PROTECTED]>
% X-X-Sender: [EMAIL PROTECTED]
% To: Rong-en Fan <[EMAIL PROTECTED]>
% cc: Kostik Belousov <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], % [EMAIL PROTECTED], Vlad Galu <[EMAIL PROTECTED]>
% Subject: Re: Fwd: panic when portupgrade in jail (devfs related?)
% In-Reply-To: <[EMAIL PROTECTED]>
% Message-ID: <[EMAIL PROTECTED]>
% References: <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]> % <[EMAIL PROTECTED]>
%  <[EMAIL PROTECTED]>
% MIME-Version: 1.0
% Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
% Status: O
% X-Status: % X-Keywords: % X-UID: 623 % % On Thu, 9 Nov 2006, Rong-en Fan wrote: % % > OK, after running 4 instances of script in jail for 10+ hours. It seems
% > that ttys are leaked (I haven't got any panic). I get lots of
% >
% > script: openpty: No such file or directory
% % This is another known bug. No fix is in sight, but I think one of the
% other known bugs can be exploited to write a program to unleak the
% ptys.  (Arrange for a nonzero vfs refcount, possibly by attempting to
% open the leaked pty, and race with revoke().  The open() will always fail
% int ptcopen(), but it may be possible to trick revoke() into calling
% ptcclose() for the unopen device; then the ptcclose() will unleak the
% device.)
% % I now have a better SMP machine for testing the races. The original
% version of my program to demonstrate the leak doesn't show the leak
% very fast under -current on this machine, and it takes 2 instances
% (parent doing open()s and child doing stat()s) to leak at all, though
% the leak occurs fast on a machine with similar speed (but configured
% with INVARIANTS etc) running 6.1.  Apparently, locking fixes in
% -current have fixed or reduced the race with stat().
% % The following version (with TIOCEXCL and both the parent and child
% doing open()'s) leaks fast on all machines that I tested on.
% % %%%
% #include <sys/ioctl.h>
% #include <sys/stat.h>
% % #include <err.h>
% #include <fcntl.h>
% #include <unistd.h>
% % int
% main(int argc, char **argv)
% {
%       struct stat sb;
%       const char *pty, *who;
%       pid_t child, parent;
%       int fd, i, lastok;
% % pty = (argc == 1 ? "/dev/ptyp0" : argv[1]);
%       i = 0;
%       lastok = -1;
%       parent = getpid();
%       child = fork();
%       who = (child == 0 ? "child" : "parent");
%       for (;;) {
%               fd = open(pty, O_WRONLY | O_NONBLOCK);
%               if (fd < 0) {
%                       if (i - lastok >= 1000) {
%                               if (child != 0)
%                                       usleep(50000);
%                               err(1, "%s open %d", who, i);
%                       }
%                       i++;
%                       continue;
%               }
%               (void)ioctl(fd, TIOCEXCL);
%               if (close(fd) != 0)
%                       err(1, "%s close %d", who, i);
%               lastok = i;
%               i++;
%       }
% }
% %%%
% % Usage is something like "./foo /dev/ptyp<victim>" as non-root. It
% quits after 1000 failed opens and a leaked pty is shown by failing
% open #'s 0 through 999 and printing i == 999 when quitting.  Some
% open failures are now normal for ptyp* since all opens of ptyp*
% are exclusive and the program now races itself.  The TIOCEXCL is to
% give similar exclusivity for testing other devices.  It only works
% as non-root.  Remove it to simplify testing of ptyp* only.  After
% an open() failure, the usleep() is to give up control so as not
% to see 1000 sequential open() failures on UP systems just because
% 1000 open()s can be done before being rescheduled.  Remove it to
% simplify testing on SMP systems.
% % Bruce
%

Any device tha enforces exclusive access for all opens (instead of
selectively/correctly according to O_EXCL and/or TIOCEXCL) has the
same leak.  Such devices must have some state which indicates that
the device is open.  When a last-close is missed, the device-is-open
state remains set, so the device cannot be opened again.  Then it
cannot be closed again.  For devices that don't do this, the missed
last-close isn't much of a problem.  New opens just work, and it
is possible to unleak the device by repeating open()/close() until
the last-close isn't missed.

Bruce
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to