Reminder that the following is going on on [EMAIL PROTECTED] Replies to there,
please.
Robert N M Watson
---------- Forwarded message ----------
Date: Wed, 29 Mar 2006 12:05:51 +0000 (GMT)
From: Robert Watson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: Randall Stewart <[EMAIL PROTECTED]>
Subject: Re: REMINDER: Re: HEADS UP: network stack and socket hackery over the
next few weeks
On Wed, 29 Mar 2006, Robert Watson wrote:
As a reminder, April 1 is now three days away. On April 1, I will be
committed an extensive set of socket and netinet changes which will likely
render the network stack broken. I say this with some confidence because I
have tested the changes fairly extensively, as have a number of other
developers, and they appear to mostly work. Therefore, they will be broken
:-). I will be posting updated versions of these patches shortly, but unless
we run into show-stopper serious instability with them, rather than nits, I
will commit them (in their updated form) on April 1 shortly after the netatm
build is disabled.
I will post another HEADS UP as the changes go into the tree, and will be
monitoring things closely to try and get any bugs that might turn up fixed as
quickly as possible. As an FYI, I will be travelling the weeks of April 6 -
April 21, but will be online frequently, and working for several days in the
Bay Area during the trip. Please report bugs relating to this work to
[EMAIL PROTECTED]
An updated version of the patch is now available for download at:
http://www.watson.org/~robert/freebsd/netperf/20060329-rwatson_sockref.diff
Earlier versions of the patch may be found in the same directory in similarly
named files. The working branch maintaining these changes may be found in
Perforce at:
//depot/user/rwatson/sockref/...
As a high level recap, the following classes of changes appear in this patch:
- The socket code now no longer relies on reading so_pcb as a hint regarding
protocol behavior and shutdown. This eliminates a number of races, and
means that only the protocol is responsible for reading/maintaining the
field, and can synchronize it as desired.
- All protocols converted to maintain the invariant that so_pcb will be
non-NULL and point to a valid PCB at all times while the socket is in valid.
Depending on the protocol, this change either removed a number of crashes
and races, or eliminated heavy-weight locking to maintain the validity of
so_pcb during use by the socket layer.
- In some cases, this required significant rewriting of state management --
specifically, for IPX/SPX and TCP/IP. SPX and TCP now maintain DROPPED
flags on their inpcb's to reflect the state previously identified through a
NULL so_pcb pointer.
- Protocols can now explicitly request that a socket not be freed on last
consumer reference, using the SS_PROTOREF flag, in order that they can
continue to access the socket buffer until it is no longer required. I.e.,
TCP after socket close() but before final ACKs from the remote endpoint for
sent data. sotryfree() is eliminated. TCP has gained an inpcb flag to
reflect this condition.
- Improved documentation of kernel socket API calls, which will be followed
with man pages once things are hammered out a bit more.
- fgetsock() and fputsock() are deprecated, with long-term plans to eliminate
the use of soref() and sorele() for consumer use. Consumers now receive a
reference to a socket using socreate(), and release it using soclose(), in
order to avoid use of sockets after close. Consumer reference counts, such
as file descriptor reference counts, should be used in preference, as this
offers cleaner behavior at the socket layer, and also avoids additional
mutex operations. Some consumer still remain, but have been annotated.
- pru_abort, pru_detach are now no longer allowed to fail. Garbage collection
of the socket after these, assuming SS_PROTOREF isn't set, is unconditional,
and not a property of the error value returned.
- Protocols now only call sofree() if they have claimed SS_PROTOREF. They
don't attempt to spontaneously free sockets in numerous situations in the
hopes of not leaking it, since socket teardown is now well-defined.
The following protocols are updated, tested, and believed to work in the new
world order:
uipc_usrreq
net (raw, routing)
netinet
netinet6
netipx
netatalk
The following protocols are updated for the new world order, but not tested:
netnatm
ng_socket
netipsec
netinet6/ipsec
netkey
The following protocols are not updated for the new world order, but the
maintainer is aware of these changes and plans to updated the protocol in the
immediate future:
ng_btsocket
The following protocols are not updated for the new world order, and do not
have a maintainer:
netatm
I will commit the changes to make netatm compile, but am pretty sure there will
be socket reference problems. Please see posts on arch@ on this topic for more
information.
As with all significant kernel changes, these changes likely include
significant bugs, which you, the -current user, will have the opportunity to
help me find. I will attempt to respond as quickly as I can, although
debugging complex network stack issues can, of course, be tricky and take a
bit. Hopefully these changes will, in the long term, improve both the
stability and performance of the FreeBSD stack, by sanitizing and sanifying
otherwise obscure and often broken behavior, and eliminating several subtle
types of race conditions that may have been responsible for occasional network
instability reported in RELENG_5 and RELENG_6 (and in some cases, RELENG_4). I
do expect the ride to initially be bumpy though.
Robert N M Watson
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"