On Thu, 19 Jul 2007, Eygene Ryabinkin wrote:
Another way to deal with the problem is not to send the FIN's after the one
provoked by the closed descriptor. As I understand, the SS_NOFDREF check is
a optimization to avoid processing unneeded data in the TCP stack. So we
may just silently blackhole the successive packets, at least some of them.
While it could be it also does that, SS_NOFDREF is actually part of the socket
state cycle, and used in part to determine when it is appropriate to free a
socket. As you observe, the key here is that there are actually three
separate and somewhat independent state cycles going on here: the file
descriptor state cycle, the socket state cycle, and the TCP state cycle.
This is further complicated by the fact that we actually have a three-part
state model for TCP, allowing reduced state to be maintained during the
three-way handshake on the server, and during the TIMEWAIT state. The trick
is to properly manage the API/protocol interactions and the data structures.
In FreeBSD 6.x and earlier, we have a moderately large number of bugs relating
to mishandling of freed TCP state, and in FreeBSD 7 in order to reduce
complexity and locking requirements, we moved to a model in which it is an
invariant of the socket<->pcb relationship that a valid PCB is present for all
"live" sockets. As such, the so->so_pcb pointer is always valid, and any
valid socket will always have valid TCP state. However, the inverse is only
sometimes true: we may free socket state when in the final stages of the TCP
connection in order to avoid keeping around the memory overhead of the socket
and socket buffers during, for example, TIMEWAIT. If you look at sofree() in
7.x, you'll see the logic we use to determine whether it's time to free the
socket itself or not:
if ((so->so_state & SS_NOFDREF) == 0 || so->so_count != 0 ||
(so->so_state & SS_PROTOREF) || (so->so_qstate & SQ_COMP)) {
SOCK_UNLOCK(so);
ACCEPT_UNLOCK();
return;
}
Notice that we have both an explicit reference count and several flags that
are effectively also references. SS_NOFDREF is set when a file descriptor, if
there has ever been one for the socket, has its reference removed.
SS_PROTOREF means that the protocol has asserted a reference on the socket --
for example, if the socket is closed but there is still pending data to be
sent out, so the socket buffers are required. SQ_COMP is set if the socket is
in a listen queue. Over the last two years, I've been gradually attempting to
move to explicit reference models, strong and well-document invariants about
the stability of the pointers that span layers (i.e., inp_ppcb, inp_socket,
so_pcb, etc), as well as gradually simplifying the model. It wouldn't
surprise me if issues remain.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"