Reminder that the following is going on on [EMAIL PROTECTED] Replies to there, please.

Robert N M Watson

---------- Forwarded message ----------
Date: Wed, 29 Mar 2006 12:05:51 +0000 (GMT)
From: Robert Watson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: Randall Stewart <[EMAIL PROTECTED]>
Subject: Re: REMINDER: Re: HEADS UP: network stack and socket hackery over the
    next few weeks


On Wed, 29 Mar 2006, Robert Watson wrote:

As a reminder, April 1 is now three days away. On April 1, I will be committed an extensive set of socket and netinet changes which will likely render the network stack broken. I say this with some confidence because I have tested the changes fairly extensively, as have a number of other developers, and they appear to mostly work. Therefore, they will be broken :-). I will be posting updated versions of these patches shortly, but unless we run into show-stopper serious instability with them, rather than nits, I will commit them (in their updated form) on April 1 shortly after the netatm build is disabled.

I will post another HEADS UP as the changes go into the tree, and will be monitoring things closely to try and get any bugs that might turn up fixed as quickly as possible. As an FYI, I will be travelling the weeks of April 6 - April 21, but will be online frequently, and working for several days in the Bay Area during the trip. Please report bugs relating to this work to [EMAIL PROTECTED]

An updated version of the patch is now available for download at:

  http://www.watson.org/~robert/freebsd/netperf/20060329-rwatson_sockref.diff

Earlier versions of the patch may be found in the same directory in similarly named files. The working branch maintaining these changes may be found in Perforce at:

  //depot/user/rwatson/sockref/...

As a high level recap, the following classes of changes appear in this patch:

- The socket code now no longer relies on reading so_pcb as a hint regarding
  protocol behavior and shutdown.  This eliminates a number of races, and
  means that only the protocol is responsible for reading/maintaining the
  field, and can synchronize it as desired.

- All protocols converted to maintain the invariant that so_pcb will be
  non-NULL and point to a valid PCB at all times while the socket is in valid.
  Depending on the protocol, this change either removed a number of crashes
  and races, or eliminated heavy-weight locking to maintain the validity of
  so_pcb during use by the socket layer.

- In some cases, this required significant rewriting of state management --
  specifically, for IPX/SPX and TCP/IP.  SPX and TCP now maintain DROPPED
  flags on their inpcb's to reflect the state previously identified through a
  NULL so_pcb pointer.

- Protocols can now explicitly request that a socket not be freed on last
  consumer reference, using the SS_PROTOREF flag, in order that they can
  continue to access the socket buffer until it is no longer required.  I.e.,
  TCP after socket close() but before final ACKs from the remote endpoint for
  sent data.  sotryfree() is eliminated.  TCP has gained an inpcb flag to
  reflect this condition.

- Improved documentation of kernel socket API calls, which will be followed
  with man pages once things are hammered out a bit more.

- fgetsock() and fputsock() are deprecated, with long-term plans to eliminate
  the use of soref() and sorele() for consumer use.  Consumers now receive a
  reference to a socket using socreate(), and release it using soclose(), in
  order to avoid use of sockets after close.  Consumer reference counts, such
  as file descriptor reference counts, should be used in preference, as this
  offers cleaner behavior at the socket layer, and also avoids additional
  mutex operations.  Some consumer still remain, but have been annotated.

- pru_abort, pru_detach are now no longer allowed to fail.  Garbage collection
  of the socket after these, assuming SS_PROTOREF isn't set, is unconditional,
  and not a property of the error value returned.

- Protocols now only call sofree() if they have claimed SS_PROTOREF.  They
  don't attempt to spontaneously free sockets in numerous situations in the
  hopes of not leaking it, since socket teardown is now well-defined.

The following protocols are updated, tested, and believed to work in the new world order:

  uipc_usrreq
  net (raw, routing)
  netinet
  netinet6
  netipx
  netatalk

The following protocols are updated for the new world order, but not tested:

  netnatm
  ng_socket
  netipsec
  netinet6/ipsec
  netkey

The following protocols are not updated for the new world order, but the maintainer is aware of these changes and plans to updated the protocol in the immediate future:

  ng_btsocket

The following protocols are not updated for the new world order, and do not have a maintainer:

    netatm

I will commit the changes to make netatm compile, but am pretty sure there will be socket reference problems. Please see posts on arch@ on this topic for more information.

As with all significant kernel changes, these changes likely include significant bugs, which you, the -current user, will have the opportunity to help me find. I will attempt to respond as quickly as I can, although debugging complex network stack issues can, of course, be tricky and take a bit. Hopefully these changes will, in the long term, improve both the stability and performance of the FreeBSD stack, by sanitizing and sanifying otherwise obscure and often broken behavior, and eliminating several subtle types of race conditions that may have been responsible for occasional network instability reported in RELENG_5 and RELENG_6 (and in some cases, RELENG_4). I do expect the ride to initially be bumpy though.

Robert N M Watson
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to