On Wed, 25 Jun 2008, Ali Niknam wrote:
precisely matches that what you'd expect: lots of TCP connections in the
CLOSED state reflecting a series of connections built by an application but
then not properly discarded. Likewise, when the application is killed, all
of the connections go away -- most likely because the file descriptors are
all closed, allowing them to be garbage collected and connection state
freed. If it is this sort of bug, then most likely you're missing a call
to close() in a work loop somewhere, and in some exceptional case, you fall
out of the loop without calling close().
I will double check this once more, but honestly, i strongly doubt it...
Also one other thing that I've noticed, is that it's always the input buffer
that has bytes left; never the output buffer...
Moreover, i've seen that close() reports EBADF, but due to the insane amount
of connections I can not say for certain that that's when the connection
goes into CLOSED state. The ip's do match, but it's very common for the same
ip's to make numerous connections too.
I think the first logical step is to wait for the application to get into that
state again, and then run procstat or fstat to dump the file descriptor away
for the process. Presumably in the normal steady state, you expect to see a
few IPC sockets (syslog, etc), a TCP listen socket, and some number of
in-progress TCP sessions. The question, of course, is whether you see a lot
more file descriptors than that, and in particular, ones that matched the
CLOSED entries in netstat. If you find that there are lots of open file
descriptors and they match up approximately with netstat, then it's an
application bug that just manifests a bit differently in 7.x than in 6.x. On
the other hand, if you see only a small number of open file descriptors, then
we may be looking at something quite a bit more complicated.
I would next seek to confirm the analysis that "they go away when the
application is killed" -- do they really disappear at the very moment it
exits, or do they kind of disappear over time and it just happens that by the
time you run netstat after killing the application, they're gone. I.e., I'd
try something like "netstat -na > file1 ; kill pid ; sleep 1 ; netstat -na >
file2 ; diff -u file1 file2". If they really all go away in a large quantity
the moment the process dies, then the reference model is working (i.e., they
are freed), but perhaps references are being held onto in an unexpected way.
For example, is the incomplete listen queue somehow getting filled with CLOSED
sockets that are only garbage collected when close() is called on the listen
socket? If we suspect that, we can actually test it by having your
application close the listen socket and re-open it once in a while, and see if
the CLOSED sockets fail to stack up.
Speaking of which, I meant to ask: are you using accept filters, and if so,
which one?
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"