Here is an updated version of the SOCK_DESTROY patch incorporating some of the feedback received.
There were two substantial concerns expressed on the approach taken in this patch. The first was that it allows applications to cause the Linux TCP stack to behave improperly. I believe this is addressed as follows: 1. This new patchset sends a RST in addition to clearing state. This is compliant behaviour: it is the ABORT operation specified in RFC 793 [1]. Any app today can do this by enabling SO_LINGER with a timeout of 0 and calling close. 2. Multiple other operating systems implement this behaviour: - FreeBSD has had this since 5.4 in 2005 [2]. It is available to privileged userspace and there is a tool to use it [3]. - The FreeBSD commit description states that the idea came from OpenBSD. - iOS has been administratively closing app sockets since iOS 4 [see 4, which states that a socket "might get reclaimed by the kernel" and after that will return EBADF]. The second concern was that userspace should not be in the business of making reachability determinations for TCP sockets; that job belongs to the kernel. But userspace makes reachability determinations all the time. Most relevant to this patchset: "-j REJECT --reject-with tcp-reset" has exactly the same effect as SOCK_DESTROY, except it only does so when the app does write or the kernel sends a keepalive, not when blocked on read. Also, there are real use cases where the kernel does not have enough information to know that a connection is now inoperable. The kernel can know if a packet can't be routed, but in general it won't if a TCP connection is dead in the water because it is now routed to a network where its source address is no longer valid [5][6]. Other concerns have been addressed in this version, as follows: 1. tcp_diag_destroy now does a proper RFC 793 ABORT, i.e., sends a RST to the peer. This is consistent with BSD's tcpdrop, and is more correct in general, even though in most use cases SOCK_DESTROY will only be called when sending a RST is no longer possible (e.g., the network has disconnected). 2. Blocking socket operations are interrupted with ECONNABORTED instead of ETIMEDOUT. This addresses Tom's point that ETIMEDOUT is vague and an explicit notification is needed. ECONNABORTED was chosen because it is consistent with BSD. 3. SOCK_DESTROY is placed behind an INET_DIAG_DESTROY configuration option, which is off by default. [1] http://tools.ietf.org/html/rfc793#page-50 [2] http://svnweb.freebsd.org/base?view=revision&revision=141381 [3] https://www.freebsd.org/cgi/man.cgi?query=tcpdrop&sektion=8&manpath=FreeBSD+5.4-RELEASE [4] https://developer.apple.com/library/ios/technotes/tn2277/_index.html#//apple_ref/doc/uid/DTS40010841-CH1-SUBSECTION3 [5] http://www.spinics.net/lists/netdev/msg352775.html [6] http://www.spinics.net/lists/netdev/msg352952.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html