On Wed, 25 Jun 2008, Ali Niknam wrote:
Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 to
FreeBSD 7.0 amd64.
After upgrading I noticed a weird error/bug. It seems that after several
thousand TCP connections some seem to hang in 'CLOSED' state.
Sounds like there's a bug somewhere. Before we start trying to track it down,
I'll tell you a little more about how this works so that we can interpret the
output you're seeing.
In FreeBSD, as with all UNIX/Berkeley sockets systems, each socket is actually
represented by a set of data structures representing different layers of
abstraction. At the top level of struct file, representing a file descriptor.
Next down is struct socket, representing a socket. Then the protocol code has
struct inpcb, representing a generic IP connection, and struct tcpcb (or
struct tcptw once we enter TIMEWAIT), representing a TCP connection.
Confusingly, these data structures don't always exist all at once. For
example, if you close the file descriptor, freeing struct file, the socket and
protocol state may persist for some time until the TCP connection closes (all
data has been sent, or various other close modes).
One important difference between FreeBSD 6.x and FreeBSD 7.x is that, in
FreeBSD 7.x, we've reduced the degree to which these data structures exist in
isolation. If you look at the mailing list threads discussing the change,
you'll see it described as "strengthening invariants". The most important
part of the change was making it an invariant that so->so_pcb, the pointer
from the socket to the protocol layer state, always remains stable and valid.
This had a number of benefits: because the pointer is always stable, it no
longer requires locks to following, lowering overhead and improving
parallelism. It also simplifies the code by removing lots of error handling,
and improved code stability by avoiding the inevitable bugs associated with
complex error handling. If you look at bug reports over the years, we've had
quite a few panics reported (and fixed) in which the disappearance of protocol
layer state, such as when a connection is reset while still in use by a
process, and these are now all believed to be eliminated.
So the code is faster, cleaner, and more stable. But there are a few
interesting side effects. One is that we retain state at the TCP layer for
longer than we used to. Specifically, if a TCP connection closes, the inpcb
remains allocated until the file descriptor is closed (i.e., the application
notices the connection has closed and invokes close() on the file descriptor).
This has a few impacts: one is that TCP connections now appear in netstat in
the CLOSED state for longer than before, and another is that open sockets that
are associated with CLOSED TCP connections now count against the global
resource limit on the number of simultaneous TCP connections.
I say "longer than before", but I should be clear that, in practice, assuming
all is working properly, there's no measurable behavioral change *except* for
improved performance, cleanliness, and stability. This is because
applications generally open a socket, run a protocol, and when the protocol
wraps up, they then close() the file descriptor in order to close the
connection.
So, with that introduction, we're interested in resolving:
(1) Is this an application bug (leaking file descriptors) that only manifests
in 7.x due to changes in kernel state management, leading to the sockets
being visible in netstat and counting against the resource limit?
(2) Is this a *new* bug in TCP in 7.x, perhaps a result of the state-related
changes I've described?
(3) Is this an *old* bug in TCP that is only now manifesting because of the
changes in kernel state management?
The first is the easiest to resolve, as all we need to do is see whether the
number of file descriptors for the application goes upwards in an improbable
manner. You can use fstat, procstat, sockstat, or various other tools (such
as lsof) to see whether the process is leaking file descriptors. You can also
instrument your application to keep track of the file descriptor numbers being
returned to see whether, perhaps, that number only goes up over time, and gets
really big.
If it turns out that your application *is* properly closing sockets, then we
need to decide if perhaps we're looking at a race in close and state
management. In particular, I'll need the output of "netstat -na", "vmstat
-z", and "vmstat -m" from the machine once it's in its rather wedged-up state.
It would be most helpful if you could actually shut down to single-user mode,
killing all user processes, then waiting ten minutes, and capturing the output
of those above commands to files that you can then e-mail to me.
Without accusing you of having buggy code, I should say that I think there's a
reasonable chance that what you're seeing is an interaction between an
existing leak of resources in the application and the way the kernel state
management has changed. The output from netstat pretty precisely matches that
what you'd expect: lots of TCP connections in the CLOSED state reflecting a
series of connections built by an application but then not properly discarded.
Likewise, when the application is killed, all of the connections go away --
most likely because the file descriptors are all closed, allowing them to be
garbage collected and connection state freed. If it is this sort of bug, then
most likely you're missing a call to close() in a work loop somewhere, and in
some exceptional case, you fall out of the loop without calling close().
If it turns out that you can get to single-user, wait ten minutes to make sure
all the connections wind down, and there are still connections visible in
netstat, then we may indeed be looking at a kernel bug, and the debugging
information using netstat and vmstat will allow us to start to investigate.
Robert N M Watson
Computer Laboratory
University of Cambridge
netstat -n gives:
...
tcp4 0 0 1.2.3.4.* 4.5.6.7.42149 CLOSED
tcp4 39 0 1.2.3.4.* 4.5.6.7.54103 CLOSED
tcp4 35 0 1.2.3.4.* 4.5.6.7.41718 CLOSED
tcp4 38 0 1.2.3.4.* 4.5.6.7.55618 CLOSED
tcp4 41 0 1.2.3.4.* 4.5.6.7.44230 CLOSED
tcp4 39 0 1.2.3.4.* 4.5.6.7.49439 CLOSED
...
These never go away; they gradually increase and increase until the
application starts giving errors (probably because some socket or
filedescriptor limit is reached). When the application is killed these
entries disappear.
The application in question is a self written DNS server, multithreaded, and
running fine for years without any troubles on both BSD 5.x as well as 6.x.
Also 32bits as well as 64bits on 6.x.
Ofcourse that doesn't mean that the application is error free, however, after
doing extensive testing I really can not find anything wrong with the
application itself, so I'm thinking maybe there's a change somewhere that
causes this? I know that tcp/network has been completely redone...
What basically happens in the application is this:
- one main tcp thread runs an infinite while loop waiting for new
connections to arrive
- as soon as one arrives a new thread is spawned that handles the newly
created stream
- it reads some bytes, writes some bytes, then closes it
- thread exits
What appears to happen is this: after the new thread is spawned it tries to
read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) and
therefore closes the sockets and calls pthread_exit. However in netstat that
same stream oftenly appears to have bytes 'stuck' in the in queue...
I really can't see how this can cause hanging sockets in 'CLOSED' state. Even
if the incoming queue isnt read entirely a call to close should close it.
Also I really can't find any documentation in netstat, or elsewhere, about
the 'CLOSED' state...
Any help would greatly be appreciated!
Kind Regards,
Ali Niknam
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"