Re: svn commit: r230264 - head/sys/sys
On 17.01.2012 13:16, Gleb Smirnoff wrote: On Tue, Jan 17, 2012 at 12:13:37PM +, Gleb Smirnoff wrote: T> Author: glebius T> Date: Tue Jan 17 12:13:36 2012 T> New Revision: 230264 T> URL: http://svn.freebsd.org/changeset/base/230264 T> T> Log: T>Provide a function m_get2() that allocates a minimal mbuf that T>would fit specified size. Returned mbuf may be a single mbuf, T>an mbuf with a cluster from packet zone, or an mbuf with jumbo T>cluster of sufficient size. I am open to discussion on bikeshed color^W^W a better name for this function. We already have m_getm2() which does the same for mbuf chains. I utilized it in pfsync, however there are several other places where it can be used instead of handrolled "if else if else" constructs. Handrolled mbuf allocation isn't good. Should be all in one place. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r226113 - head/sys/netinet
On 23.01.2012 15:24, Lawrence Stewart wrote: Hi Andre, On 10/08/11 03:39, Andre Oppermann wrote: Author: andre Date: Fri Oct 7 16:39:03 2011 New Revision: 226113 URL: http://svn.freebsd.org/changeset/base/226113 Log: Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries. [snip] Any reason this was not MFCed to stable/8 and stable/7 when you MFCed to stable/9? As far as I can tell, both r226113 and r228016 need to be MFCed to 8 and 7. Thanks for the reminder. Test build for MFC is under way, including your later fixup. I'll send it to you for review to make sure everything's correct. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r223839 - in head/sys: conf kern netinet
Author: andre Date: Thu Jul 7 10:37:14 2011 New Revision: 223839 URL: http://svn.freebsd.org/changeset/base/223839 Log: Remove the TCP_SORECEIVE_STREAM compile time option. The use of soreceive_stream() for TCP still has to be enabled with the loader tuneable net.inet.tcp.soreceive_stream. Suggested by: trociny and others Modified: head/sys/conf/options head/sys/kern/uipc_socket.c head/sys/netinet/tcp_subr.c Modified: head/sys/conf/options == --- head/sys/conf/options Thu Jul 7 09:51:31 2011(r223838) +++ head/sys/conf/options Thu Jul 7 10:37:14 2011(r223839) @@ -427,7 +427,6 @@ SLIP_IFF_OPTS opt_slip.h TCPDEBUG TCP_OFFLOAD_DISABLEopt_inet.h #Disable code to dispatch tcp offloading TCP_SIGNATURE opt_inet.h -TCP_SORECEIVE_STREAM opt_inet.h VLAN_ARRAY opt_vlan.h XBONEHACK FLOWTABLE opt_route.h Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Thu Jul 7 09:51:31 2011(r223838) +++ head/sys/kern/uipc_socket.c Thu Jul 7 10:37:14 2011(r223839) @@ -1915,7 +1915,6 @@ release: /* * Optimized version of soreceive() for stream (TCP) sockets. */ -#ifdef TCP_SORECEIVE_STREAM int soreceive_stream(struct socket *so, struct sockaddr **psa, struct uio *uio, struct mbuf **mp0, struct mbuf **controlp, int *flagsp) @@ -2109,7 +2108,6 @@ out: sbunlock(sb); return (error); } -#endif /* TCP_SORECEIVE_STREAM */ /* * Optimized version of soreceive() for simple datagram cases from userspace. Modified: head/sys/netinet/tcp_subr.c == --- head/sys/netinet/tcp_subr.c Thu Jul 7 09:51:31 2011(r223838) +++ head/sys/netinet/tcp_subr.c Thu Jul 7 10:37:14 2011(r223839) @@ -206,11 +206,9 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, &VNET_NAME(tcp_isn_reseed_interval), 0, "Seconds between reseeding of ISN secret"); -#ifdef TCP_SORECEIVE_STREAM static int tcp_soreceive_stream = 0; SYSCTL_INT(_net_inet_tcp, OID_AUTO, soreceive_stream, CTLFLAG_RDTUN, &tcp_soreceive_stream, 0, "Using soreceive_stream for TCP sockets"); -#endif #ifdef TCP_SIGNATURE static int tcp_sig_checksigs = 1; @@ -337,13 +335,13 @@ tcp_init(void) tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT; tcp_tcbhashsize = hashsize; -#ifdef TCP_SORECEIVE_STREAM TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", &tcp_soreceive_stream); if (tcp_soreceive_stream) { tcp_usrreqs.pru_soreceive = soreceive_stream; +#ifdef INET6 tcp6_usrreqs.pru_soreceive = soreceive_stream; +#endif /* INET6 */ } -#endif #ifdef INET6 #define TCP_MINPROTOHDR (sizeof(struct ip6_hdr) + sizeof(struct tcphdr)) ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r223862 - in head/sys: net netinet netinet6
On 08.07.2011 11:38, Marko Zec wrote: Author: zec Date: Fri Jul 8 09:38:33 2011 New Revision: 223862 URL: http://svn.freebsd.org/changeset/base/223862 Log: Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. Can you give an example what this does? Is it some sort of proxy ARP? While here, remove two function pointers from struct lltable which are only initialized but never used. Ideally this would have been a separate commit because it has nothing to do with primary functional change. -- Andre MFC after: 3 days Modified: head/sys/net/if_llatbl.h head/sys/netinet/in.c head/sys/netinet6/in6.c ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r223863 - head/sys/kern
Author: andre Date: Fri Jul 8 10:50:13 2011 New Revision: 223863 URL: http://svn.freebsd.org/changeset/base/223863 Log: In the experimental soreceive_stream(): o Move the non-blocking socket test below the SBS_CANTRCVMORE so that EOF is correctly returned on a remote connection close. o In the non-blocking socket test compare SS_NBIO against the so->so_state field instead of the incorrect sb->sb_state field. o Simplify the ENOTCONN test by removing cases that can't occur. Submitted by: trociny (with some further tweaks by committer) Tested by:trociny Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Fri Jul 8 09:38:33 2011(r223862) +++ head/sys/kern/uipc_socket.c Fri Jul 8 10:50:13 2011(r223863) @@ -1954,20 +1954,9 @@ soreceive_stream(struct socket *so, stru } oresid = uio->uio_resid; - /* We will never ever get anything unless we are connected. */ + /* We will never ever get anything unless we are or were connected. */ if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) { - /* When disconnecting there may be still some data left. */ - if (sb->sb_cc > 0) - goto deliver; - if (!(so->so_state & SS_ISDISCONNECTED)) - error = ENOTCONN; - goto out; - } - - /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && - ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO { - error = EAGAIN; + error = ENOTCONN; goto out; } @@ -1994,6 +1983,13 @@ restart: goto out; } + /* Socket buffer is empty and we shall not block. */ + if (sb->sb_cc == 0 && + ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO { + error = EAGAIN; + goto out; + } + /* Socket buffer got some data that we shall deliver now. */ if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && ((sb->sb_flags & SS_NBIO) || ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r210666 - head/sys/netinet
Author: andre Date: Fri Jul 30 21:45:53 2010 New Revision: 210666 URL: http://svn.freebsd.org/changeset/base/210666 Log: Fix a bug in syncache where the initial CWND for new incoming connections was limited to one segment under the faulty assumption of a retransmit. Due to this the opportunity to initialize the increased congestion window according to RFC3390 was missed. Support for RFC3465 introduced in r187289 uncovered the bug as the ACK to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this increase shouldn't happen as it's explicitly forbidden by RFC3390, but it's another issue). Snd_cwnd remains really small (1*MSS + 1) and this causes really bad interaction with delayed acks on other side. The variable name sc_rxmits is a bit misleading as it counts all transmits, not just retransmits. Submitted by: Maxim Dounin MFC after:10 days Modified: head/sys/netinet/tcp_syncache.c Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Fri Jul 30 21:39:28 2010 (r210665) +++ head/sys/netinet/tcp_syncache.c Fri Jul 30 21:45:53 2010 (r210666) @@ -804,8 +804,9 @@ syncache_socket(struct syncache *sc, str /* * If the SYN,ACK was retransmitted, reset cwnd to 1 segment. +* NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits. */ - if (sc->sc_rxmits) + if (sc->sc_rxmits > 1) tp->snd_cwnd = tp->t_maxseg; tcp_timer_activate(tp, TT_KEEP, tcp_keepinit); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211315 - head/sys/netinet
Author: andre Date: Sat Aug 14 20:40:55 2010 New Revision: 211315 URL: http://svn.freebsd.org/changeset/base/211315 Log: Disable TCP inflight limiter by default. It was experimental and interferes with the normal congestion control algorithms by instating a separate, possibly lower, ceiling for the amount of data that is in flight to the remote host. With high speed internet connections the inflight limit frequently has been estimated too low due to the noisy nature of the RTT measurements. This code gives way for the upcoming pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference. Reviewed by: lstewart MFC after:1 week Removal after:1 month Modified: head/sys/netinet/tcp_subr.c Modified: head/sys/netinet/tcp_subr.c == --- head/sys/netinet/tcp_subr.c Sat Aug 14 20:12:10 2010(r211314) +++ head/sys/netinet/tcp_subr.c Sat Aug 14 20:40:55 2010(r211315) @@ -221,7 +221,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, SYSCTL_NODE(_net_inet_tcp, OID_AUTO, inflight, CTLFLAG_RW, 0, "TCP inflight data limiting"); -static VNET_DEFINE(int, tcp_inflight_enable) = 1; +static VNET_DEFINE(int, tcp_inflight_enable) = 0; #defineV_tcp_inflight_enable VNET(tcp_inflight_enable) SYSCTL_VNET_INT(_net_inet_tcp_inflight, OID_AUTO, enable, CTLFLAG_RW, &VNET_NAME(tcp_inflight_enable), 0, ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211316 - head/sys/netinet
Author: andre Date: Sat Aug 14 21:04:27 2010 New Revision: 211316 URL: http://svn.freebsd.org/changeset/base/211316 Log: Change the messages of the ICMP bad port bandwidth limiter from a kernel printf to a log output with the priority of LOG_NOTICE. This way the messages still show up in /var/log/messages but no longer spam the console every other second on busy servers that are port scanned: "Limiting open port RST response from 114 to 100 packets/sec" PR: kern/147352 Submitted by: Eugene Grosbein MFC after:1 week Modified: head/sys/netinet/ip_icmp.c Modified: head/sys/netinet/ip_icmp.c == --- head/sys/netinet/ip_icmp.c Sat Aug 14 20:40:55 2010(r211315) +++ head/sys/netinet/ip_icmp.c Sat Aug 14 21:04:27 2010(r211316) @@ -42,6 +42,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -975,7 +976,7 @@ badport_bandlim(int which) * the previous behaviour at the expense of added complexity. */ if (V_icmplim_output && opps > V_icmplim) - printf("Limiting %s from %d to %d packets/sec\n", + log(LOG_NOTICE, "Limiting %s from %d to %d packets/sec\n", r->type, opps, V_icmplim); } return 0; /* okay to send packet */ ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211317 - head/sys/netinet
Author: andre Date: Sat Aug 14 21:41:33 2010 New Revision: 211317 URL: http://svn.freebsd.org/changeset/base/211317 Log: When using TSO and sending more than TCP_MAXWIN sendalot is set and we loop back to 'again'. If the remainder is less or equal to one full segment, the TSO flag was not cleared even though it isn't necessary anymore. Enabling the TSO flag on a segment that doesn't require any offloaded segmentation by the NIC may cause confusion in the driver or hardware. Reset the internal tso flag in tcp_output() on every iteration of sendalot. PR: kern/132832 Submitted by: Renaud Lienhart MFC after:1 week Modified: head/sys/netinet/tcp_output.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sat Aug 14 21:04:27 2010 (r211316) +++ head/sys/netinet/tcp_output.c Sat Aug 14 21:41:33 2010 (r211317) @@ -153,7 +153,7 @@ tcp_output(struct tcpcb *tp) int idle, sendalot; int sack_rxmit, sack_bytes_rxmt; struct sackhole *p; - int tso = 0; + int tso; struct tcpopt to; #if 0 int maxburst = TCP_MAXBURST; @@ -211,6 +211,7 @@ again: SEQ_LT(tp->snd_nxt, tp->snd_max)) tcp_sack_adjust(tp); sendalot = 0; + tso = 0; off = tp->snd_nxt - tp->snd_una; sendwin = min(tp->snd_wnd, tp->snd_cwnd); sendwin = min(sendwin, tp->snd_bwnd); @@ -490,9 +491,9 @@ after_sack_rexmit: } else { len = tp->t_maxseg; sendalot = 1; - tso = 0; } } + if (sack_rxmit) { if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) flags &= ~TH_FIN; @@ -1051,6 +1052,8 @@ send: * XXX: Fixme: This is currently not the case for IPv6. */ if (tso) { + KASSERT(len > tp->t_maxopd - optlen, + ("%s: len <= tso_segsz", __func__)); m->m_pkthdr.csum_flags |= CSUM_TSO; m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen; } ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211327 - head/sys/netinet
Author: andre Date: Sun Aug 15 09:30:13 2010 New Revision: 211327 URL: http://svn.freebsd.org/changeset/base/211327 Log: Add more logging points for failures in syncache_socket() to report when a new socket couldn't be created because one of in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed. Logging is conditional on net.inet.tcp.log_debug being enabled. MFC after:1 week Modified: head/sys/netinet/tcp_syncache.c Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010 (r211326) +++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010 (r211327) @@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str struct inpcb *inp = NULL; struct socket *so; struct tcpcb *tp; + int error = 0; char *s; INP_INFO_WLOCK_ASSERT(&V_tcbinfo); @@ -675,7 +676,7 @@ syncache_socket(struct syncache *sc, str } #endif inp->inp_lport = sc->sc_inc.inc_lport; - if (in_pcbinshash(inp) != 0) { + if ((error = in_pcbinshash(inp)) != 0) { /* * Undo the assignments above if we failed to * put the PCB on the hash lists. @@ -687,6 +688,12 @@ syncache_socket(struct syncache *sc, str #endif inp->inp_laddr.s_addr = INADDR_ANY; inp->inp_lport = 0; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in_pcbinshash failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } #ifdef IPSEC @@ -721,9 +728,15 @@ syncache_socket(struct syncache *sc, str laddr6 = inp->in6p_laddr; if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) inp->in6p_laddr = sc->sc_inc.inc6_laddr; - if (in6_pcbconnect(inp, (struct sockaddr *)&sin6, - thread0.td_ucred)) { + if ((error = in6_pcbconnect(inp, (struct sockaddr *)&sin6, + thread0.td_ucred)) != 0) { inp->in6p_laddr = laddr6; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in6_pcbconnect failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } /* Override flowlabel from in6_pcbconnect. */ @@ -750,9 +763,15 @@ syncache_socket(struct syncache *sc, str laddr = inp->inp_laddr; if (inp->inp_laddr.s_addr == INADDR_ANY) inp->inp_laddr = sc->sc_inc.inc_laddr; - if (in_pcbconnect(inp, (struct sockaddr *)&sin, - thread0.td_ucred)) { + if ((error = in_pcbconnect(inp, (struct sockaddr *)&sin, + thread0.td_ucred)) != 0) { inp->inp_laddr = laddr; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in_pcbconnect failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } } ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211332 - head/sys/netinet
Author: andre Date: Sun Aug 15 13:07:08 2010 New Revision: 211332 URL: http://svn.freebsd.org/changeset/base/211332 Log: Initializing the new error variable to zero in syncache_socket() is not necessary. Noticed by: bz Modified: head/sys/netinet/tcp_syncache.c Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Sun Aug 15 11:44:08 2010 (r211331) +++ head/sys/netinet/tcp_syncache.c Sun Aug 15 13:07:08 2010 (r211332) @@ -627,7 +627,7 @@ syncache_socket(struct syncache *sc, str struct inpcb *inp = NULL; struct socket *so; struct tcpcb *tp; - int error = 0; + int error; char *s; INP_INFO_WLOCK_ASSERT(&V_tcbinfo); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211333 - head/sys/netinet
Author: andre Date: Sun Aug 15 13:25:18 2010 New Revision: 211333 URL: http://svn.freebsd.org/changeset/base/211333 Log: Fix the interaction between 'ICMP fragmentation needed' MTU updates, path MTU discovery and the tcp_minmss limiter for very small MTU's. When the MTU suggested by the gateway via ICMP, or if there isn't any the next smaller step from ip_next_mtu(), is lower than the floor enforced by net.inet.tcp.minmss (default 216) the value is ignored and the default MSS (512) is used instead. However the DF flag in the IP header is still set in tcp_output() preventing fragmentation by the gateway. Fix this by using tcp_minmss as the MSS and clear the DF flag if the suggested MTU is too low. This turns off path MTU dissovery for the remainder of the session and allows fragmentation to be done by the gateway. Only MTU's smaller than 256 are affected. The smallest official MTU specified is for AX.25 packet radio at 256 octets. PR: kern/146628 Tested by:Matthew Luckie MFC after:1 week Modified: head/sys/netinet/tcp_output.c head/sys/netinet/tcp_subr.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sun Aug 15 13:07:08 2010 (r211332) +++ head/sys/netinet/tcp_output.c Sun Aug 15 13:25:18 2010 (r211333) @@ -1186,8 +1186,10 @@ timer: * This might not be the best thing to do according to RFC3390 * Section 2. However the tcp hostcache migitates the problem * so it affects only the first tcp connection with a host. +* +* NB: Don't set DF on small MTU/MSS to have a safe fallback. */ - if (V_path_mtu_discovery) + if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss) ip->ip_off |= IP_DF; error = ip_output(m, tp->t_inpcb->inp_options, NULL, Modified: head/sys/netinet/tcp_subr.c == --- head/sys/netinet/tcp_subr.c Sun Aug 15 13:07:08 2010(r211332) +++ head/sys/netinet/tcp_subr.c Sun Aug 15 13:25:18 2010(r211333) @@ -1339,11 +1339,9 @@ tcp_ctlinput(int cmd, struct sockaddr *s if (!mtu) mtu = ip_next_mtu(ip->ip_len, 1); - if (mtu < max(296, V_tcp_minmss -+ sizeof(struct tcpiphdr))) - mtu = 0; - if (!mtu) - mtu = V_tcp_mssdflt + if (mtu < V_tcp_minmss ++ sizeof(struct tcpiphdr)) + mtu = V_tcp_minmss + sizeof(struct tcpiphdr); /* * Only cache the the MTU if it ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r211327 - head/sys/netinet
On 15.08.2010 11:41, Bjoern A. Zeeb wrote: On Sun, 15 Aug 2010, Andre Oppermann wrote: Author: andre Date: Sun Aug 15 09:30:13 2010 New Revision: 211327 URL: http://svn.freebsd.org/changeset/base/211327 Log: Add more logging points for failures in syncache_socket() to report when a new socket couldn't be created because one of in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed. Logging is conditional on net.inet.tcp.log_debug being enabled. MFC after: 1 week Modified: head/sys/netinet/tcp_syncache.c Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010 (r211326) +++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010 (r211327) @@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str struct inpcb *inp = NULL; struct socket *so; struct tcpcb *tp; + int error = 0; Is there any need to initialize here? No. Actually not. Was just my style of using safe initial values. But here the return value is the socket pointer of NULL. The error is not passed back directly. Fixed in r211332. Thanks for noticing and reporting. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211396 - head/sys/vm
Author: andre Date: Mon Aug 16 14:24:00 2010 New Revision: 211396 URL: http://svn.freebsd.org/changeset/base/211396 Log: Add uma_zone_get_max() to obtain the effective limit after a call to uma_zone_set_max(). The UMA zone limit is not exactly set to the value supplied but rounded up to completely fill the backing store increment (a page normally). This can lead to surprising situations where the number of elements allocated from UMA is higher than the supplied limit value. The new get function reads back the effective value so that the supplied limit value can be adjusted to the real limit. Reviewed by: jeffr MFC after:1 week Modified: head/sys/vm/uma.h head/sys/vm/uma_core.c Modified: head/sys/vm/uma.h == --- head/sys/vm/uma.h Mon Aug 16 12:37:17 2010(r211395) +++ head/sys/vm/uma.h Mon Aug 16 14:24:00 2010(r211396) @@ -459,6 +459,18 @@ int uma_zone_set_obj(uma_zone_t zone, st void uma_zone_set_max(uma_zone_t zone, int nitems); /* + * Obtains the effective limit on the number of items in a zone + * + * Arguments: + * zone The zone to obtain the effective limit from + * + * Return: + * 0 No limit + * int The effective limit of the zone + */ +int uma_zone_get_max(uma_zone_t zone); + +/* * The following two routines (uma_zone_set_init/fini) * are used to set the backend init/fini pair which acts on an * object as it becomes allocated and is placed in a slab within Modified: head/sys/vm/uma_core.c == --- head/sys/vm/uma_core.c Mon Aug 16 12:37:17 2010(r211395) +++ head/sys/vm/uma_core.c Mon Aug 16 14:24:00 2010(r211396) @@ -2797,6 +2797,24 @@ uma_zone_set_max(uma_zone_t zone, int ni } /* See uma.h */ +int +uma_zone_get_max(uma_zone_t zone) +{ + int nitems; + uma_keg_t keg; + + ZONE_LOCK(zone); + keg = zone_first_keg(zone); + if (keg->uk_maxpages) + nitems = keg->uk_maxpages * keg->uk_ipers; + else + nitems = 0; + ZONE_UNLOCK(zone); + + return (nitems); +} + +/* See uma.h */ void uma_zone_set_init(uma_zone_t zone, uma_init uminit) { ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211462 - head/sys/netinet
Author: andre Date: Wed Aug 18 17:39:47 2010 New Revision: 211462 URL: http://svn.freebsd.org/changeset/base/211462 Log: Untangle the net.inet.tcp.log_in_vain and net.inet.tcp.log_debug sysctl's and remove any side effects. Both sysctl's share the same backend infrastructure and due to the way it was implemented enabling net.inet.tcp.log_in_vain would also cause log_debug output to be generated. This was surprising and eventually annoying to the user. The log output backend is kept the same but a little shim is inserted to properly separate log_in_vain and log_debug and to remove any side effects. PR: kern/137317 MFC after:1 week Modified: head/sys/netinet/tcp_input.c head/sys/netinet/tcp_subr.c head/sys/netinet/tcp_var.h Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cWed Aug 18 15:58:26 2010 (r211461) +++ head/sys/netinet/tcp_input.cWed Aug 18 17:39:47 2010 (r211462) @@ -571,7 +571,7 @@ findpcb: */ if ((tcp_log_in_vain == 1 && (thflags & TH_SYN)) || tcp_log_in_vain == 2) { - if ((s = tcp_log_addrs(NULL, th, (void *)ip, ip6))) + if ((s = tcp_log_vain(NULL, th, (void *)ip, ip6))) log(LOG_INFO, "%s; %s: Connection attempt " "to closed port\n", s, __func__); } Modified: head/sys/netinet/tcp_subr.c == --- head/sys/netinet/tcp_subr.c Wed Aug 18 15:58:26 2010(r211461) +++ head/sys/netinet/tcp_subr.c Wed Aug 18 17:39:47 2010(r211462) @@ -268,6 +268,8 @@ VNET_DEFINE(uma_zone_t, sack_hole_zone); static struct inpcb *tcp_notify(struct inpcb *, int); static voidtcp_isn_tick(void *); +static char * tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th, + void *ip4hdr, const void *ip6hdr); /* * Target size of TCP PCB hash tables. Must be a power of two. @@ -2234,9 +2236,33 @@ SYSCTL_PROC(_net_inet_tcp, TCPCTL_DROP, * and ip6_hdr pointers have to be passed as void pointers. */ char * +tcp_log_vain(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, +const void *ip6hdr) +{ + + /* Is logging enabled? */ + if (tcp_log_in_vain == 0) + return (NULL); + + return (tcp_log_addr(inc, th, ip4hdr, ip6hdr)); +} + +char * tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, const void *ip6hdr) { + + /* Is logging enabled? */ + if (tcp_log_debug == 0) + return (NULL); + + return (tcp_log_addr(inc, th, ip4hdr, ip6hdr)); +} + +static char * +tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, +const void *ip6hdr) +{ char *s, *sp; size_t size; struct ip *ip; @@ -2259,10 +2285,6 @@ tcp_log_addrs(struct in_conninfo *inc, s 2 * INET_ADDRSTRLEN; #endif /* INET6 */ - /* Is logging enabled? */ - if (tcp_log_debug == 0 && tcp_log_in_vain == 0) - return (NULL); - s = malloc(size, M_TCPLOG, M_ZERO|M_NOWAIT); if (s == NULL) return (NULL); Modified: head/sys/netinet/tcp_var.h == --- head/sys/netinet/tcp_var.h Wed Aug 18 15:58:26 2010(r211461) +++ head/sys/netinet/tcp_var.h Wed Aug 18 17:39:47 2010(r211462) @@ -611,6 +611,8 @@ void tcp_destroy(void); voidtcp_fini(void *); char *tcp_log_addrs(struct in_conninfo *, struct tcphdr *, void *, const void *); +char *tcp_log_vain(struct in_conninfo *, struct tcphdr *, void *, + const void *); int tcp_reass(struct tcpcb *, struct tcphdr *, int *, struct mbuf *); voidtcp_reass_init(void); #ifdef VIMAGE ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211464 - head/sys/netinet
Author: andre Date: Wed Aug 18 18:05:54 2010 New Revision: 211464 URL: http://svn.freebsd.org/changeset/base/211464 Log: If a TCP connection has been idle for one retransmit timeout or more it must reset its congestion window back to the initial window. RFC3390 has increased the initial window from 1 segment to up to 4 segments. The initial window increase of RFC3390 wasn't reflected into the restart window which remained at its original defaults of 4 segments for local and 1 segment for all other connections. Both values are controllable through sysctl net.inet.tcp.local_slowstart_flightsize and net.inet.tcp.slowstart_flightsize. The increase helps TCP's slow start algorithm to open up the congestion window much faster. Reviewed by: lstewart MFC after:1 week Modified: head/sys/netinet/tcp_output.c head/sys/netinet/tcp_var.h Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Wed Aug 18 17:40:10 2010 (r211463) +++ head/sys/netinet/tcp_output.c Wed Aug 18 18:05:54 2010 (r211464) @@ -140,7 +140,7 @@ tcp_output(struct tcpcb *tp) { struct socket *so = tp->t_inpcb->inp_socket; long len, recwin, sendwin; - int off, flags, error; + int off, flags, error, rw; struct mbuf *m; struct ip *ip = NULL; struct ipovly *ipov = NULL; @@ -176,23 +176,34 @@ tcp_output(struct tcpcb *tp) idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una); if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) { /* -* We have been idle for "a while" and no acks are -* expected to clock out any data we send -- -* slow start to get ack "clock" running again. +* If we've been idle for more than one retransmit +* timeout the old congestion window is no longer +* current and we have to reduce it to the restart +* window before we can transmit again. * -* Set the slow-start flight size depending on whether -* this is a local network or not. +* The restart window is the initial window or the last +* CWND, whichever is smaller. +* +* This is done to prevent us from flooding the path with +* a full CWND at wirespeed, overloading router and switch +* buffers along the way. +* +* See RFC5681 Section 4.1. "Restarting Idle Connections". */ - int ss = V_ss_fltsz; + if (V_tcp_do_rfc3390) + rw = min(4 * tp->t_maxseg, +max(2 * tp->t_maxseg, 4380)); #ifdef INET6 - if (isipv6) { - if (in6_localaddr(&tp->t_inpcb->in6p_faddr)) - ss = V_ss_fltsz_local; - } else -#endif /* INET6 */ - if (in_localaddr(tp->t_inpcb->inp_faddr)) - ss = V_ss_fltsz_local; - tp->snd_cwnd = tp->t_maxseg * ss; + else if ((isipv6 ? in6_localaddr(&tp->t_inpcb->in6p_faddr) : + in_localaddr(tp->t_inpcb->inp_faddr))) +#else + else if (in_localaddr(tp->t_inpcb->inp_faddr)) +#endif + rw = V_ss_fltsz_local * tp->t_maxseg; + else + rw = V_ss_fltsz * tp->t_maxseg; + + tp->snd_cwnd = min(rw, tp->snd_cwnd); } tp->t_flags &= ~TF_LASTIDLE; if (idle) { Modified: head/sys/netinet/tcp_var.h == --- head/sys/netinet/tcp_var.h Wed Aug 18 17:40:10 2010(r211463) +++ head/sys/netinet/tcp_var.h Wed Aug 18 18:05:54 2010(r211464) @@ -565,6 +565,7 @@ extern int tcp_log_in_vain; VNET_DECLARE(int, tcp_mssdflt);/* XXX */ VNET_DECLARE(int, tcp_minmss); VNET_DECLARE(int, tcp_delack_enabled); +VNET_DECLARE(int, tcp_do_rfc3390); VNET_DECLARE(int, tcp_do_newreno); VNET_DECLARE(int, path_mtu_discovery); VNET_DECLARE(int, ss_fltsz); @@ -575,6 +576,7 @@ VNET_DECLARE(int, ss_fltsz_local); #defineV_tcp_mssdflt VNET(tcp_mssdflt) #defineV_tcp_minmssVNET(tcp_minmss) #defineV_tcp_delack_enabledVNET(tcp_delack_enabled) +#defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390) #defineV_tcp_do_newrenoVNET(tcp_do_newreno) #defineV_path_mtu_discoveryVNET(path_mtu_discovery) #defineV_ss_fltsz VNET(ss_fltsz) ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubs
Re: svn commit: r211503 - head/sys/mips/atheros
On 19.08.2010 13:53, Adrian Chadd wrote: Author: adrian Date: Thu Aug 19 11:53:55 2010 New Revision: 211503 URL: http://svn.freebsd.org/changeset/base/211503 Log: Add some initial AR724X chipset support. This is untested but should at least allow an AR724X to boot. Isn't this something that should be done on a project branch and merged back when in a good working state? The current code is lacking the detail needed to expose the PCIe bus. It is also lacking any NIC, PLL or flush/WB code. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r211503 - head/sys/mips/atheros
On 19.08.2010 19:20, M. Warner Losh wrote: In message:<4c6d2933.9020...@freebsd.org> Andre Oppermann writes: : On 19.08.2010 13:53, Adrian Chadd wrote: :> Author: adrian :> Date: Thu Aug 19 11:53:55 2010 :> New Revision: 211503 :> URL: http://svn.freebsd.org/changeset/base/211503 :> :> Log: :> Add some initial AR724X chipset support. :> :> This is untested but should at least allow an AR724X to boot. : : Isn't this something that should be done on a project branch and : merged back when in a good working state? We don't have a branch for mips stuff these days. This stuff is OK, since the AR724X is just being rolled out right now... For non AR724x systems, this won't affect anything... I was more concerned about tree breakage for non-tested code. When developing something bleeding edge it is often useful to just commit some stuff and have it sorted out later. In head this is more dangerous. A small AR724X development branch would be ideal for this. Branching is cheap with SVN these days. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r211503 - head/sys/mips/atheros
On 19.08.2010 20:42, M. Warner Losh wrote: In message:<4c6d6fd7.7060...@freebsd.org> Andre Oppermann writes: : On 19.08.2010 19:20, M. Warner Losh wrote: :> In message:<4c6d2933.9020...@freebsd.org> :> Andre Oppermann writes: :> : On 19.08.2010 13:53, Adrian Chadd wrote: :> :> Author: adrian :> :> Date: Thu Aug 19 11:53:55 2010 :> :> New Revision: 211503 :> :> URL: http://svn.freebsd.org/changeset/base/211503 :> :> :> :> Log: :> :> Add some initial AR724X chipset support. :> :> :> :> This is untested but should at least allow an AR724X to boot. :> : :> : Isn't this something that should be done on a project branch and :> : merged back when in a good working state? :> :> We don't have a branch for mips stuff these days. This stuff is OK, :> since the AR724X is just being rolled out right now... For non AR724x :> systems, this won't affect anything... : : I was more concerned about tree breakage for non-tested code. When : developing something bleeding edge it is often useful to just commit : some stuff and have it sorted out later. In head this is more : dangerous. A small AR724X development branch would be ideal for : this. Branching is cheap with SVN these days. Merging isn't that cheap with svn. The svn:mergeinfo properties make them a pita. Given that this code won't break anything, except possibly the now-unsupported AR724x, I think a branch would be overkill. We'd have to drag that branch along all the time until we can get actual hardware to test it on, which is a high overhead. Didn't know that branching and merging isn't that easy with SVN after all. This was one of the supposed benefits for switching from CVS. If there is no risk of head breakage I don't mind at all. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r211874 - head/sys/netinet
Author: andre Date: Fri Aug 27 12:34:53 2010 New Revision: 211874 URL: http://svn.freebsd.org/changeset/base/211874 Log: Use timestamp modulo comparison macro for automatic receive buffer scaling to correctly handle wrapping of ticks value. MFC after:1 week Modified: head/sys/netinet/tcp_input.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cFri Aug 27 11:08:11 2010 (r211873) +++ head/sys/netinet/tcp_input.cFri Aug 27 12:34:53 2010 (r211874) @@ -1441,7 +1441,7 @@ tcp_do_segment(struct mbuf *m, struct tc if (V_tcp_do_autorcvbuf && to.to_tsecr && (so->so_rcv.sb_flags & SB_AUTOSIZE)) { - if (to.to_tsecr > tp->rfbuf_ts && + if (TSTMP_GT(to.to_tsecr, tp->rfbuf_ts) && to.to_tsecr - tp->rfbuf_ts < hz) { if (tp->rfbuf_cnt > (so->so_rcv.sb_hiwat / 8 * 7) && ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r212653 - head/sys/netinet
Author: andre Date: Wed Sep 15 10:39:30 2010 New Revision: 212653 URL: http://svn.freebsd.org/changeset/base/212653 Log: Change the default MSS for IPv4 and IPv6 TCP connections from an artificial power-of-2 rounded number to their real values specified in RFC879 and RFC2460. From the history and existing comments it appears that the rounded numbers were intended to be advantageous for the kernel and mbuf system. However this hasn't been the case at for at least a long time. The mbuf clusters used in tcp_output() have enough space to hold the larger real value for the default MSS for both IPv4 and IPv6. Note that the default MSS is only used when path MTU discovery is disabled. Update and expand related comments. Reviewed by: lsteward (including some word-smithing) MFC after:2 weeks Modified: head/sys/netinet/tcp.h Modified: head/sys/netinet/tcp.h == --- head/sys/netinet/tcp.h Wed Sep 15 10:39:21 2010(r212652) +++ head/sys/netinet/tcp.h Wed Sep 15 10:39:30 2010(r212653) @@ -103,29 +103,37 @@ struct tcphdr { /* - * Default maximum segment size for TCP. - * With an IP MTU of 576, this is 536, - * but 512 is probably more convenient. - * This should be defined as MIN(512, IP_MSS - sizeof (struct tcpiphdr)). - */ -#defineTCP_MSS 512 -/* - * TCP_MINMSS is defined to be 216 which is fine for the smallest - * link MTU (256 bytes, AX.25 packet radio) in the Internet. - * However it is very unlikely to come across such low MTU interfaces - * these days (anno dato 2003). - * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. - * Setting this to "0" disables the minmss check. + * The default maximum segment size (MSS) to be used for new TCP connections + * when path MTU discovery is not enabled. + * + * RFC879 derives the default MSS from the largest datagram size hosts are + * minimally required to handle directly or through IP reassembly minus the + * size of the IP and TCP header. With IPv6 the minimum MTU is specified + * in RFC2460. + * + * For IPv4 the MSS is 576 - sizeof(struct tcpiphdr) + * For IPv6 the MSS is IPV6_MMTU - sizeof(struct ip6_hdr) - sizeof(struct tcphdr) + * + * We use explicit numerical definition here to avoid header pollution. */ -#defineTCP_MINMSS 216 +#defineTCP_MSS 536 +#defineTCP6_MSS1220 /* - * Default maximum segment size for TCP6. - * With an IP6 MSS of 1280, this is 1220, - * but 1024 is probably more convenient. (xxx kazu in doubt) - * This should be defined as MIN(1024, IP6_MSS - sizeof (struct tcpip6hdr)) + * Limit the lowest MSS we accept from path MTU discovery and the TCP SYN MSS + * option. Allowing too low values of MSS can consume significant amounts of + * resources and be used as a form of a resource exhaustion attack. + * Connections requesting lower MSS values will be rounded up to this value + * and the IP_DF flag is cleared to allow fragmentation along the path. + * + * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting + * it to "0" disables the minmss check. + * + * The default value is fine for the smallest official link MTU (256 bytes, + * AX.25 packet radio) in the Internet. However it is very unlikely to come + * across such low MTU interfaces these days (anno domini 2003). */ -#defineTCP6_MSS1024 +#defineTCP_MINMSS 216 #defineTCP_MAXWIN 65535 /* largest value for (unscaled) window */ #defineTTCP_CLIENT_SND_WND 4096/* dflt send window for T/TCP client */ ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r212653 - head/sys/netinet
On 15.09.2010 13:51, Lawrence Stewart wrote: On 09/15/10 20:39, Andre Oppermann wrote: Author: andre Date: Wed Sep 15 10:39:30 2010 New Revision: 212653 URL: http://svn.freebsd.org/changeset/base/212653 Log: Change the default MSS for IPv4 and IPv6 TCP connections from an artificial power-of-2 rounded number to their real values specified in RFC879 and RFC2460. From the history and existing comments it appears that the rounded numbers were intended to be advantageous for the kernel and mbuf system. However this hasn't been the case at for at least a long time. The mbuf clusters used in tcp_output() have enough space to hold the larger real value for the default MSS for both IPv4 and IPv6. Note that the default MSS is only used when path MTU discovery is disabled. Update and expand related comments. Reviewed by: lsteward (including some word-smithing) For the record, I reviewed and fully support the functional changes made by this patch, but explicitly objected to and offered an alternate for the proposed comment wording changes. Andre, given that we had a disagreement about the comment wording, I would have preferred it if you had noted in your commit log that I had raised an objection to or at least not reviewed/endorsed the comment changes. I've adapted many of your suggestions on the wording compared to my first version. For some parts I felt that my wording/description was more appropriate. In the end neither of our wordings is plain wrong or factually incorrect. It's not important enough an issue to spend any more time on, but I'm a bit upset to see this committed with an acknowledgement to my review and word-smithing, much of which ended up being ignored (which is fine, but then don't put my name to it). I apologize for not having made your different opinion to the wording clear enough in the commit message. My intent was to communicate that you not only reviewed the functional change but also provided input on the wording (which I in fact did not incorporate to some extent but not entirely). Below is the wording proposed by Lawrence: /* * The default Maximum Segment Size (MSS) to use when we do not have specific * knowledge (e.g. via path MTU discovery) that the destination host is prepared * to accept larger datagrams. The smallest allowable IP datagram MTU and * optionless IP/TCP header lengths are used for the calculation as per RFC879. * For IPv4 (RFC791): 576 - 20 - 20 = 536. * For IPv6 (RFC2460): 1280 - 40 - 20 = 1220. */ #define TCP_MSS 536 #define TCP6_MSS1220 * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS * option. Allowing low values of MSS can consume significant resources and be * used to mount a resource exhaustion attack. Connections requesting lower MSS * values will be rounded up to this value and the IP_DF flag will be cleared to * allow fragmentation along the path. * * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting this * SYSCTL to "0" disables the minmss check. * * The default value is fine for TCP over IPv4 across the Internet's smallest * known link MTU (256 bytes for AX.25 packet radio). However, a connection is * very unlikely to come across such low MTU interfaces (anno domini 2003). */ #define TCP_MINMSS 216 -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r212653 - head/sys/netinet
On 15.09.2010 18:12, John Baldwin wrote: On Wednesday, September 15, 2010 10:04:45 am Andre Oppermann wrote: Below is the wording proposed by Lawrence: /* * The default Maximum Segment Size (MSS) to use when we do not have specific * knowledge (e.g. via path MTU discovery) that the destination host is prepared * to accept larger datagrams. The smallest allowable IP datagram MTU and * optionless IP/TCP header lengths are used for the calculation as per RFC879. * For IPv4 (RFC791): 576 - 20 - 20 = 536. * For IPv6 (RFC2460): 1280 - 40 - 20 = 1220. */ #define TCP_MSS 536 #define TCP6_MSS1220 I think the existing text is fine for this comment, with one nit: * For IPv4 the MSS is 576 - sizeof(struct tcpiphdr) I would find it clearer if it was 'sizeof(struct ip) - sizeof(struct tcphdr)' instead. I chose 'sizeof(struct tcpiphdr)' for consistency with other parts of the TCP code where the MSS is calculated this way. 'struct tcpiphdr' predates IPv6 and is commonly used in the BSD kernel code. * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS * option. Allowing low values of MSS can consume significant resources and be * used to mount a resource exhaustion attack. Connections requesting lower MSS * values will be rounded up to this value and the IP_DF flag will be cleared to * allow fragmentation along the path. * * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting this * SYSCTL to "0" disables the minmss check. * * The default value is fine for TCP over IPv4 across the Internet's smallest * known link MTU (256 bytes for AX.25 packet radio). However, a connection is * very unlikely to come across such low MTU interfaces (anno domini 2003). */ #define TCP_MINMSS 216 I actually prefer the above text for this block. The 'amounts of resources' phrase is certainly redundant and just 'resources' is clearer. OK. I'll update the comment with a small change to the third paragraph. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r212731 - head/sys/netinet
Author: andre Date: Thu Sep 16 12:13:06 2010 New Revision: 212731 URL: http://svn.freebsd.org/changeset/base/212731 Log: Improve comment to TCP_MINMSS by taking the wording from lstewart (with a small difference in the last paragraph though) as suggested by jhb. Clarify that the 'reviewed by' in r212653 by lstewart was for the functional change, not the comments in the committed version. Modified: head/sys/netinet/tcp.h Modified: head/sys/netinet/tcp.h == --- head/sys/netinet/tcp.h Thu Sep 16 12:05:46 2010(r212730) +++ head/sys/netinet/tcp.h Thu Sep 16 12:13:06 2010(r212731) @@ -120,18 +120,18 @@ struct tcphdr { #defineTCP6_MSS1220 /* - * Limit the lowest MSS we accept from path MTU discovery and the TCP SYN MSS - * option. Allowing too low values of MSS can consume significant amounts of - * resources and be used as a form of a resource exhaustion attack. + * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS + * option. Allowing low values of MSS can consume significant resources and + * be used to mount a resource exhaustion attack. * Connections requesting lower MSS values will be rounded up to this value - * and the IP_DF flag is cleared to allow fragmentation along the path. + * and the IP_DF flag will be cleared to allow fragmentation along the path. * * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting * it to "0" disables the minmss check. * - * The default value is fine for the smallest official link MTU (256 bytes, - * AX.25 packet radio) in the Internet. However it is very unlikely to come - * across such low MTU interfaces these days (anno domini 2003). + * The default value is fine for TCP across the Internet's smallest official + * link MTU (256 bytes for AX.25 packet radio). However, a connection is very + * unlikely to come across such low MTU interfaces these days (anno domini 2003). */ #defineTCP_MINMSS 216 ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r212765 - head/sys/netinet
Author: andre Date: Thu Sep 16 21:06:45 2010 New Revision: 212765 URL: http://svn.freebsd.org/changeset/base/212765 Log: Remove the TCP inflight bandwidth limiter as announced in r211315 to give way for the pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference. In 'struct tcpcb' the variables previously used by the inflight limiter are renamed to spares to keep the ABI intact and to have some more space for future extensions. In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to preserve the ABI. It is always set to 0. In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed to preserve the ABI. It is always set to 0. These unused variable in the various structures may be reused in the future or garbage collected before the next release or at some other point when an ABI change happens anyway for other reasons. No MFC is planned. The inflight bandwidth limiter stays disabled by default in the other branches but remains available. Modified: head/sys/netinet/siftr.c head/sys/netinet/tcp.h head/sys/netinet/tcp_input.c head/sys/netinet/tcp_output.c head/sys/netinet/tcp_subr.c head/sys/netinet/tcp_timer.h head/sys/netinet/tcp_usrreq.c head/sys/netinet/tcp_var.h Modified: head/sys/netinet/siftr.c == --- head/sys/netinet/siftr.cThu Sep 16 21:06:23 2010(r212764) +++ head/sys/netinet/siftr.cThu Sep 16 21:06:45 2010(r212765) @@ -193,7 +193,7 @@ struct pkt_node { u_long snd_wnd; /* Receive Window (bytes). */ u_long rcv_wnd; - /* Bandwidth Controlled Window (bytes). */ + /* Unused (was: Bandwidth Controlled Window (bytes)). */ u_long snd_bwnd; /* Slow Start Threshold (bytes). */ u_long snd_ssthresh; @@ -775,7 +775,7 @@ siftr_siftdata(struct pkt_node *pn, stru pn->snd_cwnd = tp->snd_cwnd; pn->snd_wnd = tp->snd_wnd; pn->rcv_wnd = tp->rcv_wnd; - pn->snd_bwnd = tp->snd_bwnd; + pn->snd_bwnd = 0; /* Unused, kept for compat. */ pn->snd_ssthresh = tp->snd_ssthresh; pn->snd_scale = tp->snd_scale; pn->rcv_scale = tp->rcv_scale; Modified: head/sys/netinet/tcp.h == --- head/sys/netinet/tcp.h Thu Sep 16 21:06:23 2010(r212764) +++ head/sys/netinet/tcp.h Thu Sep 16 21:06:45 2010(r212765) @@ -221,7 +221,7 @@ struct tcp_info { /* FreeBSD extensions to tcp_info. */ u_int32_t tcpi_snd_wnd; /* Advertised send window. */ - u_int32_t tcpi_snd_bwnd; /* Bandwidth send window. */ + u_int32_t tcpi_snd_bwnd; /* No longer used. */ u_int32_t tcpi_snd_nxt; /* Next egress seqno */ u_int32_t tcpi_rcv_nxt; /* Next ingress seqno */ u_int32_t tcpi_toe_tid; /* HWTID for TOE endpoints */ Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cThu Sep 16 21:06:23 2010 (r212764) +++ head/sys/netinet/tcp_input.cThu Sep 16 21:06:45 2010 (r212765) @@ -1321,7 +1321,6 @@ tcp_do_segment(struct mbuf *m, struct tc tcp_xmit_timer(tp, ticks - tp->t_rtttime); } - tcp_xmit_bandwidth_limit(tp, th->th_ack); acked = th->th_ack - tp->snd_una; TCPSTAT_INC(tcps_rcvackpack); TCPSTAT_ADD(tcps_rcvackbyte, acked); @@ -2278,7 +2277,6 @@ process_ACK: tp->t_rttlow = ticks - tp->t_rtttime; tcp_xmit_timer(tp, ticks - tp->t_rtttime); } - tcp_xmit_bandwidth_limit(tp, th->th_ack); /* * If all outstanding data is acked, stop retransmit @@ -3328,8 +3326,6 @@ tcp_mss(struct tcpcb *tp, int offer) tp->snd_ssthresh = max(2 * mss, metrics.rmx_ssthresh); TCPSTAT_INC(tcps_usedssthresh); } - if (metrics.rmx_bandwidth) - tp->snd_bandwidth = metrics.rmx_bandwidth; /* * Set the slow-start flight size depending on whether this Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Thu Sep 16 21:06:23 2010 (r212764) +++ head/sys/netinet/tcp_output.c Thu
svn commit: r212769 - head/share/man/man4
Author: andre Date: Thu Sep 16 22:11:55 2010 New Revision: 212769 URL: http://svn.freebsd.org/changeset/base/212769 Log: The inflight bandwidth limiter was removed in r212765. Modified: head/share/man/man4/tcp.4 Modified: head/share/man/man4/tcp.4 == --- head/share/man/man4/tcp.4 Thu Sep 16 21:18:25 2010(r212768) +++ head/share/man/man4/tcp.4 Thu Sep 16 22:11:55 2010(r212769) @@ -32,7 +32,7 @@ .\" From: @(#)tcp.48.1 (Berkeley) 6/5/93 .\" $FreeBSD$ .\" -.Dd August 16, 2008 +.Dd September 16, 2010 .Dt TCP 4 .Os .Sh NAME @@ -383,72 +383,6 @@ code. For this reason, we use 200ms of slop and a near-0 minimum, which gives us an effective minimum of 200ms (similar to .Tn Linux ) . -.It Va inflight.enable -Enable -.Tn TCP -bandwidth-delay product limiting. -An attempt will be made to calculate -the bandwidth-delay product for each individual -.Tn TCP -connection, and limit -the amount of inflight data being transmitted, to avoid building up -unnecessary packets in the network. -This option is recommended if you -are serving a lot of data over connections with high bandwidth-delay -products, such as modems, GigE links, and fast long-haul WANs, and/or -you have configured your machine to accommodate large -.Tn TCP -windows. -In such -situations, without this option, you may experience high interactive -latencies or packet loss due to the overloading of intermediate routers -and switches. -Note that bandwidth-delay product limiting only effects -the transmit side of a -.Tn TCP -connection. -.It Va inflight.debug -Enable debugging for the bandwidth-delay product algorithm. -.It Va inflight.min -This puts a lower bound on the bandwidth-delay product window, in bytes. -A value of 1024 is typically used for debugging. -6000-16000 is more typical in a production installation. -Setting this value too low may result in -slow ramp-up times for bursty connections. -Setting this value too high effectively disables the algorithm. -.It Va inflight.max -This puts an upper bound on the bandwidth-delay product window, in bytes. -This value should not generally be modified, but may be used to set a -global per-connection limit on queued data, potentially allowing you to -intentionally set a less than optimum limit, to smooth data flow over a -network while still being able to specify huge internal -.Tn TCP -buffers. -.It Va inflight.stab -The bandwidth-delay product algorithm requires a slightly larger window -than it otherwise calculates for stability. -This parameter determines the extra window in maximal packets / 10. -The default value of 20 represents 2 maximal packets. -Reducing this value is not recommended, but you may -come across a situation with very slow links where the -.Xr ping 8 -time -reduction of the default inflight code is not sufficient. -If this case occurs, you should first try reducing -.Va inflight.min -and, if that does not -work, reduce both -.Va inflight.min -and -.Va inflight.stab , -trying values of -15, 10, or 5 for the latter. -Never use a value less than 5. -Reducing -.Va inflight.stab -can lead to upwards of a 20% underutilization of the link -as well as reducing the algorithm's ability to adapt to changing -situations and should only be done as a last resort. .It Va rfc3042 Enable the Limited Transmit algorithm as described in RFC 3042. It helps avoid timeouts on lossy links and also when the congestion window ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r212803 - head/sys/netinet
Author: andre Date: Fri Sep 17 22:05:27 2010 New Revision: 212803 URL: http://svn.freebsd.org/changeset/base/212803 Log: Rearrange the TSO code to make it more readable and to clearly separate the decision logic, of whether we can do TSO, and the calculation of the burst length into two distinct parts. Change the way the TSO burst length calculation is done. While TSO could do bursts of 65535 bytes that can't be represented in ip_len together with the IP and TCP header. Account for that and use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both have the same value of 64K). When more data is available prevent less than MSS sized segments from being sent during the current TSO burst. Add two more KASSERTs to ensure the integrity of the packets. Tested by:Ben Wilber MFC after:10 days Modified: head/sys/netinet/tcp_output.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Fri Sep 17 21:53:56 2010 (r212802) +++ head/sys/netinet/tcp_output.c Fri Sep 17 22:05:27 2010 (r212803) @@ -465,9 +465,8 @@ after_sack_rexmit: } /* -* Truncate to the maximum segment length or enable TCP Segmentation -* Offloading (if supported by hardware) and ensure that FIN is removed -* if the length no longer contains the last data byte. +* Decide if we can use TCP Segmentation Offloading (if supported by +* hardware). * * TSO may only be used if we are in a pure bulk sending state. The * presence of TCP-MD5, SACK retransmits, SACK advertizements and @@ -475,10 +474,6 @@ after_sack_rexmit: * (except for the sequence number) for all generated packets. This * makes it impossible to transmit any options which vary per generated * segment or packet. -* -* The length of TSO bursts is limited to TCP_MAXWIN. That limit and -* removal of FIN (if not already catched here) are handled later after -* the exact length of the TCP options are known. */ #ifdef IPSEC /* @@ -487,22 +482,15 @@ after_sack_rexmit: */ ipsec_optlen = ipsec_hdrsiz_tcp(tp); #endif - if (len > tp->t_maxseg) { - if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && - ((tp->t_flags & TF_SIGNATURE) == 0) && - tp->rcv_numsacks == 0 && sack_rxmit == 0 && - tp->t_inpcb->inp_options == NULL && - tp->t_inpcb->in6p_options == NULL + if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg && + ((tp->t_flags & TF_SIGNATURE) == 0) && + tp->rcv_numsacks == 0 && sack_rxmit == 0 && #ifdef IPSEC - && ipsec_optlen == 0 + ipsec_optlen == 0 && #endif - ) { - tso = 1; - } else { - len = tp->t_maxseg; - sendalot = 1; - } - } + tp->t_inpcb->inp_options == NULL && + tp->t_inpcb->in6p_options == NULL) + tso = 1; if (sack_rxmit) { if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) @@ -732,28 +720,53 @@ send: * bump the packet length beyond the t_maxopd length. * Clear the FIN bit because we cut off the tail of * the segment. -* -* When doing TSO limit a burst to TCP_MAXWIN minus the -* IP, TCP and Options length to keep ip->ip_len from -* overflowing. Prevent the last segment from being -* fractional thus making them all equal sized and set -* the flag to continue sending. TSO is disabled when -* IP options or IPSEC are present. */ if (len + optlen + ipoptlen > tp->t_maxopd) { flags &= ~TH_FIN; + if (tso) { - if (len > TCP_MAXWIN - hdrlen - optlen) { - len = TCP_MAXWIN - hdrlen - optlen; - len = len - (len % (tp->t_maxopd - optlen)); + KASSERT(ipoptlen == 0, + ("%s: TSO can't do IP options", __func__)); + + /* +* Limit a burst to IP_MAXPACKET minus IP, +* TCP and options length to keep ip->ip_len +* from overflowing. +*/ + if (len > IP_MAXPACKET - hdrlen) { + len = IP_MAXPACKET - hdrlen; + sendalot = 1; + } + + /* +* Prevent the last segment from being +* fractional unless the send sockbuf can +* be emptied. +*/ +
Re: svn commit: r212803 - head/sys/netinet
On 18.09.2010 13:34, Bjoern A. Zeeb wrote: On Fri, 17 Sep 2010, Andre Oppermann wrote: @@ -487,22 +482,15 @@ after_sack_rexmit: */ ipsec_optlen = ipsec_hdrsiz_tcp(tp); #endif - if (len > tp->t_maxseg) { - if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && - ((tp->t_flags & TF_SIGNATURE) == 0) && - tp->rcv_numsacks == 0 && sack_rxmit == 0 && - tp->t_inpcb->inp_options == NULL && - tp->t_inpcb->in6p_options == NULL + if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg && + ((tp->t_flags & TF_SIGNATURE) == 0) && + tp->rcv_numsacks == 0 && sack_rxmit == 0 && #ifdef IPSEC - && ipsec_optlen == 0 + ipsec_optlen == 0 && #endif - ) { - tso = 1; - } else { - len = tp->t_maxseg; - sendalot = 1; - } - } + tp->t_inpcb->inp_options == NULL && + tp->t_inpcb->in6p_options == NULL) + tso = 1; In the non-TSO case you are no longer reducing len to tp->t_maxseg here, if it's larger, which I think breaks asssumptions all the way down. No assumptions are broken for the non-TSO case. The value of len is only tested against t_maxseg for being equal or grater. This always hold true. When the decision to send has been made len is correctly limited in the non-TSO and TSO case. Before it was a bit of either was done in both places. That is now merged into one spot. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r212803 - head/sys/netinet
On 23.10.2010 15:10, Bjoern A. Zeeb wrote: On Fri, 17 Sep 2010, Andre Oppermann wrote: Author: andre Date: Fri Sep 17 22:05:27 2010 New Revision: 212803 URL: http://svn.freebsd.org/changeset/base/212803 Log: Rearrange the TSO code to make it more readable and to clearly separate the decision logic, of whether we can do TSO, and the calculation of the burst length into two distinct parts. Change the way the TSO burst length calculation is done. While TSO could do bursts of 65535 bytes that can't be represented in ip_len together with the IP and TCP header. Account for that and use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both have the same value of 64K). When more data is available prevent less than MSS sized segments from being sent during the current TSO burst. Add two more KASSERTs to ensure the integrity of the packets. Tested by: Ben Wilber MFC after: 10 days As this hasn't happned yet, please do not do. It breaks things. I'll follow-up later as soon as I have more details. I was busied out after the EuroBSDCon DevSummit and didn't have have time to MFC. Incidentially I was planning on doing it today, but will hold off based on your request. The version currently in 8 certainly has a bug. For the one in head you are the first report. Others reported their all their issues to be fixed with this patch. Can you give an high level description of the problem you are seeing? A detailed description is not required to take a first look on whatever issue you may have. -- Andre Modified: head/sys/netinet/tcp_output.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Fri Sep 17 21:53:56 2010 (r212802) +++ head/sys/netinet/tcp_output.c Fri Sep 17 22:05:27 2010 (r212803) @@ -465,9 +465,8 @@ after_sack_rexmit: } /* - * Truncate to the maximum segment length or enable TCP Segmentation - * Offloading (if supported by hardware) and ensure that FIN is removed - * if the length no longer contains the last data byte. + * Decide if we can use TCP Segmentation Offloading (if supported by + * hardware). * * TSO may only be used if we are in a pure bulk sending state. The * presence of TCP-MD5, SACK retransmits, SACK advertizements and @@ -475,10 +474,6 @@ after_sack_rexmit: * (except for the sequence number) for all generated packets. This * makes it impossible to transmit any options which vary per generated * segment or packet. - * - * The length of TSO bursts is limited to TCP_MAXWIN. That limit and - * removal of FIN (if not already catched here) are handled later after - * the exact length of the TCP options are known. */ #ifdef IPSEC /* @@ -487,22 +482,15 @@ after_sack_rexmit: */ ipsec_optlen = ipsec_hdrsiz_tcp(tp); #endif - if (len > tp->t_maxseg) { - if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && - ((tp->t_flags & TF_SIGNATURE) == 0) && - tp->rcv_numsacks == 0 && sack_rxmit == 0 && - tp->t_inpcb->inp_options == NULL && - tp->t_inpcb->in6p_options == NULL + if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg && + ((tp->t_flags & TF_SIGNATURE) == 0) && + tp->rcv_numsacks == 0 && sack_rxmit == 0 && #ifdef IPSEC - && ipsec_optlen == 0 + ipsec_optlen == 0 && #endif - ) { - tso = 1; - } else { - len = tp->t_maxseg; - sendalot = 1; - } - } + tp->t_inpcb->inp_options == NULL && + tp->t_inpcb->in6p_options == NULL) + tso = 1; if (sack_rxmit) { if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) @@ -732,28 +720,53 @@ send: * bump the packet length beyond the t_maxopd length. * Clear the FIN bit because we cut off the tail of * the segment. - * - * When doing TSO limit a burst to TCP_MAXWIN minus the - * IP, TCP and Options length to keep ip->ip_len from - * overflowing. Prevent the last segment from being - * fractional thus making them all equal sized and set - * the flag to continue sending. TSO is disabled when - * IP options or IPSEC are present. */ if (len + optlen + ipoptlen > tp->t_maxopd) { flags &= ~TH_FIN; + if (tso) { - if (len > TCP_MAXWIN - hdrlen - optlen) { - len = TCP_MAXWIN - hdrlen - optlen; - len = len - (len % (tp->t_maxopd - optlen)); + KASSERT(ipoptlen == 0, + ("%s: TSO can't do IP options", __func__)); + + /* + * Limit a burst to IP_MAXPACKET minus IP, + * TCP and options length to keep ip->ip_len + * from overflowing. + */ + if (len > IP_MAXPACKET - hdrlen) { + len = IP_MAXPACKET - hdrlen; + sendalot = 1; + } + + /* + * Prevent the last segment from being + * fractional unless the send sockbuf can + * be emptied. + */ + if (sendalot && off + len < so->so_snd.sb_cc) { + len -= len % (tp->t_maxopd - optlen); sendalot = 1; - } else if (tp->t_flags & TF_NEEDFIN) +
svn commit: r226105 - head/sys/netinet
Author: andre Date: Fri Oct 7 13:43:01 2011 New Revision: 226105 URL: http://svn.freebsd.org/changeset/base/226105 Log: Add back the IP header length to the total packet length field on raw IP sockets. It was deducted in ip_input() in preparation for protocols interested only in the payload. On raw sockets the IP header should be delivered as it at came in from the network except for the byte order swaps in some fields. This brings us in line with all other OS'es that provide raw IP sockets. Reported by: Matthew Cini Sarreo MFC after: 3 days Modified: head/sys/netinet/raw_ip.c Modified: head/sys/netinet/raw_ip.c == --- head/sys/netinet/raw_ip.c Fri Oct 7 13:16:21 2011(r226104) +++ head/sys/netinet/raw_ip.c Fri Oct 7 13:43:01 2011(r226105) @@ -289,6 +289,13 @@ rip_input(struct mbuf *m, int off) last = NULL; ifp = m->m_pkthdr.rcvif; + /* +* Add back the IP header length which was +* removed by ip_input(). Raw sockets do +* not modify the packet except for some +* byte order swaps. +*/ + ip->ip_len += off; hash = INP_PCBHASH_RAW(proto, ip->ip_src.s_addr, ip->ip_dst.s_addr, V_ripcbinfo.ipi_hashmask); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r226113 - head/sys/netinet
Author: andre Date: Fri Oct 7 16:39:03 2011 New Revision: 226113 URL: http://svn.freebsd.org/changeset/base/226113 Log: Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by:Steven Hartland MFC after:3 days Modified: head/sys/netinet/tcp_reass.c Modified: head/sys/netinet/tcp_reass.c == --- head/sys/netinet/tcp_reass.cFri Oct 7 16:09:44 2011 (r226112) +++ head/sys/netinet/tcp_reass.cFri Oct 7 16:39:03 2011 (r226113) @@ -177,7 +177,9 @@ tcp_reass(struct tcpcb *tp, struct tcphd struct tseg_qent *nq; struct tseg_qent *te = NULL; struct socket *so = tp->t_inpcb->inp_socket; + char *s = NULL; int flags; + struct tseg_qent tqs; INP_WLOCK_ASSERT(tp->t_inpcb); @@ -215,19 +217,40 @@ tcp_reass(struct tcpcb *tp, struct tcphd TCPSTAT_INC(tcps_rcvmemdrop); m_freem(m); *tlenp = 0; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: queue limit reached, " + "segment dropped\n", s, __func__); + free(s, M_TCPLOG); + } return (0); } /* * Allocate a new queue entry. If we can't, or hit the zone limit * just drop the pkt. +* +* Use a temporary structure on the stack for the missing segment +* when the zone is exhausted. Otherwise we may get stuck. */ te = uma_zalloc(V_tcp_reass_zone, M_NOWAIT); - if (te == NULL) { + if (te == NULL && th->th_seq != tp->rcv_nxt) { TCPSTAT_INC(tcps_rcvmemdrop); m_freem(m); *tlenp = 0; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: global zone limit reached, " + "segment dropped\n", s, __func__); + free(s, M_TCPLOG); + } return (0); + } else if (th->th_seq == tp->rcv_nxt) { + bzero(&tqs, sizeof(struct tseg_qent)); + te = &tqs; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: global zone limit reached, " + "using stack for missing segment\n", s, __func__); + free(s, M_TCPLOG); + } } tp->t_segqlen++; @@ -304,6 +327,8 @@ tcp_reass(struct tcpcb *tp, struct tcphd if (p == NULL) { LIST_INSERT_HEAD(&tp->t_segq, te, tqe_q); } else { + KASSERT(te != &tqs, ("%s: temporary stack based entry not " + "first element in queue", __func__)); LIST_INSERT_AFTER(p, te, tqe_q); } @@ -327,7 +352,8 @@ present: m_freem(q->tqe_m); else sbappendstream_locked(&so->so_rcv, q->tqe_m); - uma_zfree(V_tcp_reass_zone, q); + if (q != &tqs) + uma_zfree(V_tcp_reass_zone, q); tp->t_segqlen--; q = nq; } while (q && q->tqe_th->th_seq == tp->rcv_nxt); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r226113 - head/sys/netinet
Hi Lawrence, Sorry for jumping in here. There was some urgency felt at EuroBSDCon to get this issue fixed before the next RC. -- Andre On 08.10.2011 02:56, Lawrence Stewart wrote: Hi Andre and RE team, I've had a patch sitting in re@'s inbox for this problem since 15th Sep and have been waiting for their go-ahead to commit. The patch I submitted is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcpreassstackfix_9.x.r225576.diff The proposed commit message was: ## Use a backup (stack allocated) struct tseg_qent when we are unable to allocate one from the TCP reassembly UMA zone and the incoming segment is the one we've been waiting for (i.e. th_seq == rcv_nxt). This avoids TCP connections stalling when the zone limit is reached. PR: kern/155407 Reported by: Slawa Olhovchenkov and Steven Hartland Tested by: Steven Hartland Submitted by: andre Reviewed by: jhb Approved by: re (?) MFC after: 1 week ## I feel the logging changes should have been committed separately to the fix, but other than that, what you committed achieves the same thing as the patch I proposed. I should have updated the ML thread to say it was submitted and awaiting approval, so you weren't to know. Anyhoo, I guess I'll leave it up to you and re@ to sort out how you want to proceed, but wanted to make sure everyone was on the same page as RE would have gotten confused when you requested your patch be MFCed. Cheers, Lawrence On 10/08/11 03:39, Andre Oppermann wrote: Author: andre Date: Fri Oct 7 16:39:03 2011 New Revision: 226113 URL: http://svn.freebsd.org/changeset/base/226113 Log: Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by: Steven Hartland MFC after: 3 days Modified: head/sys/netinet/tcp_reass.c Modified: head/sys/netinet/tcp_reass.c == --- head/sys/netinet/tcp_reass.c Fri Oct 7 16:09:44 2011 (r226112) +++ head/sys/netinet/tcp_reass.c Fri Oct 7 16:39:03 2011 (r226113) @@ -177,7 +177,9 @@ tcp_reass(struct tcpcb *tp, struct tcphd struct tseg_qent *nq; struct tseg_qent *te = NULL; struct socket *so = tp->t_inpcb->inp_socket; + char *s = NULL; int flags; + struct tseg_qent tqs; INP_WLOCK_ASSERT(tp->t_inpcb); @@ -215,19 +217,40 @@ tcp_reass(struct tcpcb *tp, struct tcphd TCPSTAT_INC(tcps_rcvmemdrop); m_freem(m); *tlenp = 0; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: queue limit reached, " + "segment dropped\n", s, __func__); + free(s, M_TCPLOG); + } return (0); } /* * Allocate a new queue entry. If we can't, or hit the zone limit * just drop the pkt. + * + * Use a temporary structure on the stack for the missing segment + * when the zone is exhausted. Otherwise we may get stuck. */ te = uma_zalloc(V_tcp_reass_zone, M_NOWAIT); - if (te == NULL) { + if (te == NULL&& th->th_seq != tp->rcv_nxt) { TCPSTAT_INC(tcps_rcvmemdrop); m_freem(m); *tlenp = 0; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: global zone limit reached, " + "segment dropped\n", s, __func__); + free(s, M_TCPLOG); + } return (0); + } else if (th->th_seq == tp->rcv_nxt) { + bzero(&tqs, sizeof(struct tseg_qent)); + te =&tqs; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: global zone limit reached, " + "using stack for missing segment\n", s, __func__); + free(s, M_TCPLOG); + } } tp->t_segqlen++; @@ -304,6 +327,8 @@ tcp_reass(struct tcpcb *tp, struct tcphd if (p == NULL) { LIST_INSERT_HEAD(&tp->t_segq, te, tqe_q); } else { + KASSERT(te !=&tqs, ("%s: temporary stack based entry not " + "first element in queue", __func__)); LIST_INSERT_AFTER(p, te, tqe_q); } @@ -327,7 +352,8 @@ present: m_freem(q->tqe_m); else sbappendstream_locked(&so->so_rcv, q->tqe_m); - uma_zfree(V_tcp_reass_zone, q); + if (q !=&tqs) + uma_zfree(V_tcp_reass_zone, q); tp->t_segqlen--; q = nq; } while (q&& q->tqe_th->th_seq == tp->rcv_nxt); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r226433 - head/sys/netinet
Author: andre Date: Sun Oct 16 13:54:46 2011 New Revision: 226433 URL: http://svn.freebsd.org/changeset/base/226433 Log: Update the comment and description of tcp_sendspace and tcp_recvspace to better reflect their purpose. MFC after:1 week Modified: head/sys/netinet/tcp_usrreq.c Modified: head/sys/netinet/tcp_usrreq.c == --- head/sys/netinet/tcp_usrreq.c Sun Oct 16 11:08:51 2011 (r226432) +++ head/sys/netinet/tcp_usrreq.c Sun Oct 16 13:54:46 2011 (r226433) @@ -1498,16 +1498,15 @@ tcp_ctloutput(struct socket *so, struct #undef INP_WLOCK_RECHECK /* - * tcp_sendspace and tcp_recvspace are the default send and receive window - * sizes, respectively. These are obsolescent (this information should - * be set by the route). + * Set the initial send and receive socket buffer sizes for + * newly created TCP sockets. */ u_long tcp_sendspace = 1024*32; SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW, -&tcp_sendspace , 0, "Maximum outgoing TCP datagram size"); +&tcp_sendspace , 0, "Initial send socket buffer size"); u_long tcp_recvspace = 1024*64; SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW, -&tcp_recvspace , 0, "Maximum incoming TCP datagram size"); +&tcp_recvspace , 0, "Initial receive socket buffer size"); /* * Attach TCP protocol to socket, allocating ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r226437 - head/sys/netinet
Author: andre Date: Sun Oct 16 15:08:43 2011 New Revision: 226437 URL: http://svn.freebsd.org/changeset/base/226437 Log: VNET virtualize tcp_sendspace/tcp_recvspace and change the type to INT. A long is not necessary as the TCP window is limited to 2**30. A larger initial window isn't useful. MFC after:1 week Modified: head/sys/netinet/tcp_input.c head/sys/netinet/tcp_usrreq.c head/sys/netinet/tcp_var.h Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 16 14:30:28 2011 (r226436) +++ head/sys/netinet/tcp_input.cSun Oct 16 15:08:43 2011 (r226437) @@ -3517,7 +3517,7 @@ tcp_mss(struct tcpcb *tp, int offer) */ so = inp->inp_socket; SOCKBUF_LOCK(&so->so_snd); - if ((so->so_snd.sb_hiwat == tcp_sendspace) && metrics.rmx_sendpipe) + if ((so->so_snd.sb_hiwat == V_tcp_sendspace) && metrics.rmx_sendpipe) bufsize = metrics.rmx_sendpipe; else bufsize = so->so_snd.sb_hiwat; @@ -3534,7 +3534,7 @@ tcp_mss(struct tcpcb *tp, int offer) tp->t_maxseg = mss; SOCKBUF_LOCK(&so->so_rcv); - if ((so->so_rcv.sb_hiwat == tcp_recvspace) && metrics.rmx_recvpipe) + if ((so->so_rcv.sb_hiwat == V_tcp_recvspace) && metrics.rmx_recvpipe) bufsize = metrics.rmx_recvpipe; else bufsize = so->so_rcv.sb_hiwat; Modified: head/sys/netinet/tcp_usrreq.c == --- head/sys/netinet/tcp_usrreq.c Sun Oct 16 14:30:28 2011 (r226436) +++ head/sys/netinet/tcp_usrreq.c Sun Oct 16 15:08:43 2011 (r226437) @@ -1501,12 +1501,15 @@ tcp_ctloutput(struct socket *so, struct * Set the initial send and receive socket buffer sizes for * newly created TCP sockets. */ -u_long tcp_sendspace = 1024*32; -SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW, -&tcp_sendspace , 0, "Initial send socket buffer size"); -u_long tcp_recvspace = 1024*64; -SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW, -&tcp_recvspace , 0, "Initial receive socket buffer size"); +VNET_DEFINE(int, tcp_sendspace) = 1024*32; +#defineV_tcp_sendspace VNET(tcp_sendspace) +SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW, +&VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size"); + +VNET_DEFINE(int, tcp_recvspace) = 1024*64 +#defineV_tcp_recvspace VNET(tcp_recvspace) +SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW, +&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size"); /* * Attach TCP protocol to socket, allocating @@ -1521,7 +1524,7 @@ tcp_attach(struct socket *so) int error; if (so->so_snd.sb_hiwat == 0 || so->so_rcv.sb_hiwat == 0) { - error = soreserve(so, tcp_sendspace, tcp_recvspace); + error = soreserve(so, V_tcp_sendspace, V_tcp_recvspace); if (error) return (error); } Modified: head/sys/netinet/tcp_var.h == --- head/sys/netinet/tcp_var.h Sun Oct 16 14:30:28 2011(r226436) +++ head/sys/netinet/tcp_var.h Sun Oct 16 15:08:43 2011(r226437) @@ -606,6 +606,8 @@ VNET_DECLARE(int, tcp_mssdflt); /* XXX * VNET_DECLARE(int, tcp_minmss); VNET_DECLARE(int, tcp_delack_enabled); VNET_DECLARE(int, tcp_do_rfc3390); +VNET_DECLARE(int, tcp_sendspace); +VNET_DECLARE(int, tcp_recvspace); VNET_DECLARE(int, path_mtu_discovery); VNET_DECLARE(int, ss_fltsz); VNET_DECLARE(int, ss_fltsz_local); @@ -618,6 +620,8 @@ VNET_DECLARE(int, tcp_abc_l_var); #defineV_tcp_minmssVNET(tcp_minmss) #defineV_tcp_delack_enabledVNET(tcp_delack_enabled) #defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390) +#defineV_tcp_sendspace VNET(tcp_sendspace) +#defineV_tcp_recvspace VNET(tcp_recvspace) #defineV_path_mtu_discoveryVNET(path_mtu_discovery) #defineV_ss_fltsz VNET(ss_fltsz) #defineV_ss_fltsz_localVNET(ss_fltsz_local) @@ -716,8 +720,6 @@ void tcp_hc_updatemtu(struct in_conninf voidtcp_hc_update(struct in_conninfo *, struct hc_metrics_lite *); extern struct pr_usrreqs tcp_usrreqs; -extern u_long tcp_sendspace; -extern u_long tcp_recvspace; tcp_seq tcp_new_isn(struct tcpcb *); voidtcp_sack_doack(struct tcpcb *, struct tcpopt *, tcp_seq); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r226447 - head/sys/netinet
Author: andre Date: Sun Oct 16 20:06:44 2011 New Revision: 226447 URL: http://svn.freebsd.org/changeset/base/226447 Log: Remove the ss_fltsz and ss_fltsz_local sysctl's which have long been superseded by the RFC3390 initial CWND sizing. Also remove the remnants of TCP_METRICS_CWND which used the TCP hostcache to set the initial CWND in a non-RFC compliant way. MFC after:1 week Modified: head/sys/netinet/tcp_input.c head/sys/netinet/tcp_output.c head/sys/netinet/tcp_var.h Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 16 19:46:52 2011 (r226446) +++ head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011 (r226447) @@ -301,9 +301,6 @@ cc_conn_init(struct tcpcb *tp) struct hc_metrics_lite metrics; struct inpcb *inp = tp->t_inpcb; int rtt; -#ifdef INET6 - int isipv6 = ((inp->inp_vflag & INP_IPV6) != 0) ? 1 : 0; -#endif INP_WLOCK_ASSERT(tp->t_inpcb); @@ -337,49 +334,16 @@ cc_conn_init(struct tcpcb *tp) } /* -* Set the slow-start flight size depending on whether this -* is a local network or not. -* -* Extend this so we cache the cwnd too and retrieve it here. -* Make cwnd even bigger than RFC3390 suggests but only if we -* have previous experience with the remote host. Be careful -* not make cwnd bigger than remote receive window or our own -* send socket buffer. Maybe put some additional upper bound -* on the retrieved cwnd. Should do incremental updates to -* hostcache when cwnd collapses so next connection doesn't -* overloads the path again. -* -* XXXAO: Initializing the CWND from the hostcache is broken -* and in its current form not RFC conformant. It is disabled -* until fixed or removed entirely. +* Set the initial slow-start flight size. * * RFC3390 says only do this if SYN or SYN/ACK didn't got lost. -* We currently check only in syncache_socket for that. +* XXX: We currently check only in syncache_socket for that. */ -/* #define TCP_METRICS_CWND */ -#ifdef TCP_METRICS_CWND - if (metrics.rmx_cwnd) - tp->snd_cwnd = max(tp->t_maxseg, min(metrics.rmx_cwnd / 2, - min(tp->snd_wnd, so->so_snd.sb_hiwat))); - else -#endif if (V_tcp_do_rfc3390) tp->snd_cwnd = min(4 * tp->t_maxseg, max(2 * tp->t_maxseg, 4380)); -#ifdef INET6 - else if (isipv6 && in6_localaddr(&inp->in6p_faddr)) - tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local; -#endif -#if defined(INET) && defined(INET6) - else if (!isipv6 && in_localaddr(inp->inp_faddr)) - tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local; -#endif -#ifdef INET - else if (in_localaddr(inp->inp_faddr)) - tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local; -#endif else - tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz; + tp->snd_cwnd = tp->t_maxseg; if (CC_ALGO(tp)->conn_init != NULL) CC_ALGO(tp)->conn_init(tp->ccv); Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sun Oct 16 19:46:52 2011 (r226446) +++ head/sys/netinet/tcp_output.c Sun Oct 16 20:06:44 2011 (r226447) @@ -89,16 +89,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, &VNET_NAME(path_mtu_discovery), 1, "Enable Path MTU Discovery"); -VNET_DEFINE(int, ss_fltsz) = 1; -SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, slowstart_flightsize, CTLFLAG_RW, - &VNET_NAME(ss_fltsz), 1, - "Slow start flight size"); - -VNET_DEFINE(int, ss_fltsz_local) = 4; -SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, local_slowstart_flightsize, - CTLFLAG_RW, &VNET_NAME(ss_fltsz_local), 1, - "Slow start flight size for local networks"); - VNET_DEFINE(int, tcp_do_tso) = 1; #defineV_tcp_do_tsoVNET(tcp_do_tso) SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, tso, CTLFLAG_RW, Modified: head/sys/netinet/tcp_var.h == --- head/sys/netinet/tcp_var.h Sun Oct 16 19:46:52 2011(r226446) +++ head/sys/netinet/tcp_var.h Sun Oct 16 20:06:44 2011(r226447) @@ -609,8 +609,6 @@ VNET_DECLARE(int, tcp_do_rfc3390); VNET_DECLARE(int, tcp_sendspace); VNET_DECLARE(int, tcp_recvspace); VNET_DECLARE(int, path_mtu_discovery); -VNET_DECLARE(int, ss_fltsz); -VNET_DECLARE(int, ss_fltsz_local); VNET_DECLARE(int, tcp_do_rfc3465); VNET_DECLARE(int, tcp_abc_l_var); #defineV_tcb VNET(tcb) @@ -623,8 +621,6 @@ VNET_DECLARE(int, tcp_abc_l_var); #defineV_tcp_sendspace VNET(tcp_sendspace) #d
svn commit: r226448 - head/sys/netinet
Author: andre Date: Sun Oct 16 20:18:39 2011 New Revision: 226448 URL: http://svn.freebsd.org/changeset/base/226448 Log: Move the tcp_sendspace and tcp_recvspace sysctl's from the middle of tcp_usrreq.c to the top of tcp_output.c and tcp_input.c respectively next to the socket buffer autosizing controls. MFC after:1 week Modified: head/sys/netinet/tcp_input.c head/sys/netinet/tcp_output.c head/sys/netinet/tcp_usrreq.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011 (r226447) +++ head/sys/netinet/tcp_input.cSun Oct 16 20:18:39 2011 (r226448) @@ -183,6 +183,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, &VNET_NAME(tcp_insecure_rst), 0, "Follow the old (insecure) criteria for accepting RST packets"); +VNET_DEFINE(int, tcp_recvspace) = 1024*64 +#defineV_tcp_recvspace VNET(tcp_recvspace) +SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW, +&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size"); + VNET_DEFINE(int, tcp_do_autorcvbuf) = 1; #defineV_tcp_do_autorcvbuf VNET(tcp_do_autorcvbuf) SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, recvbuf_auto, CTLFLAG_RW, Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sun Oct 16 20:06:44 2011 (r226447) +++ head/sys/netinet/tcp_output.c Sun Oct 16 20:18:39 2011 (r226448) @@ -95,6 +95,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, &VNET_NAME(tcp_do_tso), 0, "Enable TCP Segmentation Offload"); +VNET_DEFINE(int, tcp_sendspace) = 1024*32; +#defineV_tcp_sendspace VNET(tcp_sendspace) +SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW, + &VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size"); + VNET_DEFINE(int, tcp_do_autosndbuf) = 1; #defineV_tcp_do_autosndbuf VNET(tcp_do_autosndbuf) SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, sendbuf_auto, CTLFLAG_RW, Modified: head/sys/netinet/tcp_usrreq.c == --- head/sys/netinet/tcp_usrreq.c Sun Oct 16 20:06:44 2011 (r226447) +++ head/sys/netinet/tcp_usrreq.c Sun Oct 16 20:18:39 2011 (r226448) @@ -1498,20 +1498,6 @@ tcp_ctloutput(struct socket *so, struct #undef INP_WLOCK_RECHECK /* - * Set the initial send and receive socket buffer sizes for - * newly created TCP sockets. - */ -VNET_DEFINE(int, tcp_sendspace) = 1024*32; -#defineV_tcp_sendspace VNET(tcp_sendspace) -SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW, -&VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size"); - -VNET_DEFINE(int, tcp_recvspace) = 1024*64 -#defineV_tcp_recvspace VNET(tcp_recvspace) -SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW, -&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size"); - -/* * Attach TCP protocol to socket, allocating * internet protocol control block, tcp control block, * bufer space, and entering LISTEN state if to accept connections. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r226454 - head/sys/netinet
On 17.10.2011 02:16, Bjoern A. Zeeb wrote: On 17. Oct 2011, at 00:05 , Bjoern A. Zeeb wrote: Author: bz Date: Mon Oct 17 00:05:31 2011 New Revision: 226454 URL: http://svn.freebsd.org/changeset/base/226454 Log: Add syntactic sugar missed in r226437 and then not added either when moving things around in r226448 but desperately needed to always make things compile successfully. GENRIC and LINT did not fail failed on it as it expanded to: int tcp_recvspace = 1024*64 followed by: #define SYSCTL_VNET_INT(parent, nbr, name, access, ptr, val, descr) \ SYSCTL_INT(parent, nbr, name, access, ptr, val, descr) => #define SYSCTL_INT(parent, nbr, name, access, ptr, val, descr) \ SYSCTL_ASSERT_TYPE(INT, ptr, parent, name); \ SYSCTL_OID(parent, nbr, name, \ CTLTYPE_INT | CTLFLAG_MPSAFE | (access),\ ptr, val, sysctl_handle_int, "I", descr) and the SYSCTL_ASSERT_TYPE() expanding to nothing in #define SYSCTL_ASSERT_TYPE(type, ptr, parent, name) leaving just the ';' around; so it ended up as: int tcp_recvspace = 1024*64 ; and an expanded SYSCTL_OID(...); Oops, sorry missing that one. And thanks for comitting the fix. -- Andre MFC after:1 week Modified: head/sys/netinet/tcp_input.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 16 22:24:04 2011 (r226453) +++ head/sys/netinet/tcp_input.cMon Oct 17 00:05:31 2011 (r226454) @@ -183,7 +183,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, &VNET_NAME(tcp_insecure_rst), 0, "Follow the old (insecure) criteria for accepting RST packets"); -VNET_DEFINE(int, tcp_recvspace) = 1024*64 +VNET_DEFINE(int, tcp_recvspace) = 1024*64; #define V_tcp_recvspace VNET(tcp_recvspace) SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW, &VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size"); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r227499 - head/share/man/man4
Author: andre Date: Mon Nov 14 15:10:42 2011 New Revision: 227499 URL: http://svn.freebsd.org/changeset/base/227499 Log: Note the ip_len bug fixed in r226105 in the BUGS section. Modified: head/share/man/man4/ip.4 Modified: head/share/man/man4/ip.4 == --- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498) +++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499) @@ -32,7 +32,7 @@ .\" @(#)ip.4 8.2 (Berkeley) 11/30/93 .\" $FreeBSD$ .\" -.Dd June 1, 2009 +.Dd November 14, 2011 .Dt IP 4 .Os .Sh NAME @@ -847,3 +847,9 @@ The .Vt ip_mreqn structure appeared in .Tn Linux 2.4 . +.Sh BUGS +Before +.Fx 10.0 packets received on raw IP sockets had the +.Va ip_hl +subtracted from the +.Va ip_len field. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r227500 - head/share/man/man4
Author: andre Date: Mon Nov 14 15:14:42 2011 New Revision: 227500 URL: http://svn.freebsd.org/changeset/base/227500 Log: Remove mention of ss_fltsz and ss_fltsz_local which were retired in r226447. Modified: head/share/man/man4/tcp.4 Modified: head/share/man/man4/tcp.4 == --- head/share/man/man4/tcp.4 Mon Nov 14 15:10:42 2011(r227499) +++ head/share/man/man4/tcp.4 Mon Nov 14 15:14:42 2011(r227500) @@ -38,7 +38,7 @@ .\" From: @(#)tcp.48.1 (Berkeley) 6/5/93 .\" $FreeBSD$ .\" -.Dd September 15, 2011 +.Dd November 14, 2011 .Dt TCP 4 .Os .Sh NAME @@ -290,14 +290,6 @@ That of 2 results in any packets to closed ports being logged. Any value unlisted above disables the logging (default is 0, i.e., the logging is disabled). -.It Va slowstart_flightsize -The number of packets allowed to be in-flight during the -.Tn TCP -slow-start phase on a non-local network. -.It Va local_slowstart_flightsize -The number of packets allowed to be in-flight during the -.Tn TCP -slow-start phase to local machines in the same subnet. .It Va msl The Maximum Segment Lifetime, in milliseconds, for a packet. .It Va keepinit @@ -411,15 +403,6 @@ maximum segment size. This helps throughput in general, but particularly affects short transfers and high-bandwidth large propagation-delay connections. -.Pp -When this feature is enabled, the -.Va slowstart_flightsize -and -.Va local_slowstart_flightsize -settings are not observed for new -connection slow starts, but they are still used for slow starts -that occur when the connection has been idle and starts sending -again. .It Va sack.enable Enable support for RFC 2018, TCP Selective Acknowledgment option, which allows the receiver to inform the sender about all successfully ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r227499 - head/share/man/man4
On 14.11.2011 16:38, Garrett Cooper wrote: On Mon, Nov 14, 2011 at 7:10 AM, Andre Oppermann wrote: Author: andre Date: Mon Nov 14 15:10:42 2011 New Revision: 227499 URL: http://svn.freebsd.org/changeset/base/227499 Log: Note the ip_len bug fixed in r226105 in the BUGS section. Modified: head/share/man/man4/ip.4 Modified: head/share/man/man4/ip.4 == --- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498) +++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499) @@ -32,7 +32,7 @@ .\" @(#)ip.4 8.2 (Berkeley) 11/30/93 .\" $FreeBSD$ .\" -.Dd June 1, 2009 +.Dd November 14, 2011 .Dt IP 4 .Os .Sh NAME @@ -847,3 +847,9 @@ The .Vt ip_mreqn structure appeared in .Tn Linux 2.4 . +.Sh BUGS +Before +.Fx 10.0 packets received on raw IP sockets had the +.Va ip_hl +subtracted from the +.Va ip_len field. Isn't the fix going to be MFCed? It was. However there are some ports depending on this bug and due to the late stage we are in the release cycle we decided to back out the MFC. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r227501 - head/share/man/man4
Author: andre Date: Mon Nov 14 15:57:03 2011 New Revision: 227501 URL: http://svn.freebsd.org/changeset/base/227501 Log: mdoc fix for r227499. Reported by: brueffer Modified: head/share/man/man4/ip.4 Modified: head/share/man/man4/ip.4 == --- head/share/man/man4/ip.4Mon Nov 14 15:14:42 2011(r227500) +++ head/share/man/man4/ip.4Mon Nov 14 15:57:03 2011(r227501) @@ -849,7 +849,8 @@ structure appeared in .Tn Linux 2.4 . .Sh BUGS Before -.Fx 10.0 packets received on raw IP sockets had the +.Fx 10.0 +packets received on raw IP sockets had the .Va ip_hl subtracted from the .Va ip_len field. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r249843 - head/sys/kern
Author: andre Date: Wed Apr 24 13:54:55 2013 New Revision: 249843 URL: http://svnweb.freebsd.org/changeset/base/249843 Log: Base the calculation of maxmbufmem in part on kmem_map size instead of kernel_map size to prevent kernel memory exhaustion by mbufs and a subsequent panic on physical page allocation failure. On architectures without a direct map all mbuf memory (except for jumbo mbufs larger than PAGE_SIZE) comes from kmem_map. It is the limiting factor hence. For architectures with a direct map using the size of kmem_map is a good proxy of available kernel memory as well. If it is much smaller the mbuf limit may be sub-optimal but remains reasonable, while avoiding panics under exhaustion. The overall mbuf memory limit calculation may be reconsidered again later, however due to the many different mbuf sizes and different backing KVM maps it is a tricky subject. Found by: pho's new network stress test Pointed out by: alc (kmem_map instead of kernel_map) Tested by:pho Modified: head/sys/kern/kern_mbuf.c Modified: head/sys/kern/kern_mbuf.c == --- head/sys/kern/kern_mbuf.c Wed Apr 24 13:19:48 2013(r249842) +++ head/sys/kern/kern_mbuf.c Wed Apr 24 13:54:55 2013(r249843) @@ -118,7 +118,7 @@ tunable_mbinit(void *dummy) * At most it can be 3/4 of available kernel memory. */ realmem = qmin((quad_t)physmem * PAGE_SIZE, - vm_map_max(kernel_map) - vm_map_min(kernel_map)); + vm_map_max(kmem_map) - vm_map_min(kmem_map)); maxmbufmem = realmem / 2; TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); if (maxmbufmem > realmem / 4 * 3) ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r250300 - in head/sys: kern net netinet sys
Author: andre Date: Mon May 6 16:42:18 2013 New Revision: 250300 URL: http://svnweb.freebsd.org/changeset/base/250300 Log: Back out r249318, r249320 and r249327 due to a heisenbug most likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation. Modified: head/sys/kern/uipc_socket.c head/sys/net/if.c head/sys/net/if_llatbl.c head/sys/net/if_llatbl.h head/sys/net/if_var.h head/sys/netinet/in_pcb.h head/sys/netinet/in_var.h head/sys/netinet/ip_id.c head/sys/netinet/ip_input.c head/sys/netinet/tcp_subr.c head/sys/sys/socketvar.h Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Mon May 6 16:11:53 2013(r250299) +++ head/sys/kern/uipc_socket.c Mon May 6 16:42:18 2013(r250300) @@ -240,14 +240,14 @@ SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO * accept_mtx locks down per-socket fields relating to accept queues. See * socketvar.h for an annotation of the protected fields of struct socket. */ -struct mtx_padalign accept_mtx; +struct mtx accept_mtx; MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF); /* * so_global_mtx protects so_gencnt, numopensockets, and the per-socket * so_gencnt field. */ -static struct mtx_padalign so_global_mtx; +static struct mtx so_global_mtx; MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF); /* Modified: head/sys/net/if.c == --- head/sys/net/if.c Mon May 6 16:11:53 2013(r250299) +++ head/sys/net/if.c Mon May 6 16:42:18 2013(r250300) @@ -206,7 +206,7 @@ VNET_DEFINE(struct ifindex_entry *, ifin * also to stablize it over long-running ioctls, without introducing priority * inversions and deadlocks. */ -struct rwlock_padalign ifnet_rwlock; +struct rwlock ifnet_rwlock; struct sx ifnet_sxlock; /* Modified: head/sys/net/if_llatbl.c == --- head/sys/net/if_llatbl.cMon May 6 16:11:53 2013(r250299) +++ head/sys/net/if_llatbl.cMon May 6 16:42:18 2013(r250300) @@ -67,7 +67,7 @@ static VNET_DEFINE(SLIST_HEAD(, lltable) static void vnet_lltable_init(void); -struct rwlock_padalign lltable_rwlock; +struct rwlock lltable_rwlock; RW_SYSINIT(lltable_rwlock, &lltable_rwlock, "lltable_rwlock"); /* Modified: head/sys/net/if_llatbl.h == --- head/sys/net/if_llatbl.hMon May 6 16:11:53 2013(r250299) +++ head/sys/net/if_llatbl.hMon May 6 16:42:18 2013(r250300) @@ -43,7 +43,7 @@ struct rt_addrinfo; struct llentry; LIST_HEAD(llentries, llentry); -extern struct rwlock_padalign lltable_rwlock; +extern struct rwlock lltable_rwlock; #defineLLTABLE_RLOCK() rw_rlock(&lltable_rwlock) #defineLLTABLE_RUNLOCK() rw_runlock(&lltable_rwlock) #defineLLTABLE_WLOCK() rw_wlock(&lltable_rwlock) Modified: head/sys/net/if_var.h == --- head/sys/net/if_var.h Mon May 6 16:11:53 2013(r250299) +++ head/sys/net/if_var.h Mon May 6 16:42:18 2013(r250300) @@ -191,9 +191,9 @@ struct ifnet { void*if_unused[2]; void*if_afdata[AF_MAX]; int if_afdata_initialized; + struct rwlock if_afdata_lock; struct task if_linktask; /* task for link change events */ - struct rwlock_padalign if_afdata_lock; - struct rwlock_padalign if_addr_lock; /* lock to protect address lists */ + struct rwlock if_addr_lock;/* lock to protect address lists */ LIST_ENTRY(ifnet) if_clones;/* interfaces of a cloner */ TAILQ_HEAD(, ifg_list) if_groups; /* linked list of groups per if */ @@ -832,7 +832,7 @@ struct ifmultiaddr { #ifdef _KERNEL -extern struct rwlock_padalign ifnet_rwlock; +extern struct rwlock ifnet_rwlock; extern struct sx ifnet_sxlock; #defineIFNET_LOCK_INIT() do { \ Modified: head/sys/netinet/in_pcb.h == --- head/sys/netinet/in_pcb.h Mon May 6 16:11:53 2013(r250299) +++ head/sys/netinet/in_pcb.h Mon May 6 16:42:18 2013(r250300) @@ -330,7 +330,7 @@ struct inpcbinfo { /* * Global lock protecting non-pcbgroup hash lookup tables. */ - struct rwlock_padalign ipi_hash_lock; + struct rwlockipi_hash_lock; /* * Global hash of inpcbs, hashed by local and foreign addresses and Modified: head/sys/netinet/in_var.h == --- head/sys/netinet/in_var.h Mon May 6
svn commit: r250365 - head/sys/kern
Author: andre Date: Wed May 8 14:13:14 2013 New Revision: 250365 URL: http://svnweb.freebsd.org/changeset/base/250365 Log: When the accept queue is full print the number of already pending new connections instead of by how many we're over the limit, which is always 1. Noticed by: jmallet MFC after:1 week Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Wed May 8 13:26:17 2013(r250364) +++ head/sys/kern/uipc_socket.c Wed May 8 14:13:14 2013(r250365) @@ -515,7 +515,7 @@ sonewconn(struct socket *head, int conns #endif log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: " "%i already in queue awaiting acceptance\n", - __func__, head->so_pcb, over); + __func__, head->so_pcb, head->so_qlen); return (NULL); } VNET_ASSERT(head->so_vnet != NULL, ("%s:%d so_vnet is NULL, head=%p", ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r250658 - in head: share/mk sys/conf tools/build/options
On 15.05.2013 15:04, Brooks Davis wrote: Author: brooks Date: Wed May 15 13:04:10 2013 New Revision: 250658 URL: http://svnweb.freebsd.org/changeset/base/250658 Log: Add a new option WITHOUT_FORMAT_EXTENSIONS to disable flags related to checking our kernel printf extensions. This is useful to allow compilers without these extensions to build kernels. Sponsored by:DARPA, AFRL This breaks "make depend" at least on amd64: "../../../conf/kern.mk", line 37: Malformed conditional (${MK_FORMAT_EXTENSIONS} == "no") "../../../conf/kern.mk", line 39: if-less else "../../../conf/kern.mk", line 41: if-less endif make: fatal errors encountered -- cannot continue -- Andre Added: head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS (contents, props changed) Modified: head/share/mk/bsd.own.mk head/sys/conf/kern.mk Modified: head/share/mk/bsd.own.mk == --- head/share/mk/bsd.own.mkWed May 15 08:38:49 2013(r250657) +++ head/share/mk/bsd.own.mkWed May 15 13:04:10 2013(r250658) @@ -268,6 +268,7 @@ __DEFAULT_YES_OPTIONS = \ ED_CRYPTO \ EXAMPLES \ FLOPPY \ +FORMAT_EXTENSIONS \ FORTH \ FP_LIBC \ FREEBSD_UPDATE \ Modified: head/sys/conf/kern.mk == --- head/sys/conf/kern.mk Wed May 15 08:38:49 2013(r250657) +++ head/sys/conf/kern.mk Wed May 15 13:04:10 2013(r250658) @@ -5,7 +5,7 @@ # CWARNFLAGS?= -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes \ -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual \ - -Wundef -Wno-pointer-sign -fformat-extensions \ + -Wundef -Wno-pointer-sign ${FORMAT_EXTENTIONS} \ -Wmissing-include-dirs -fdiagnostics-show-option \ ${CWARNEXTRA} # @@ -29,7 +29,15 @@ NO_WSOMETIMES_UNINITIALIZED= -Wno-error- # enough to error out the whole kernel build. Display them anyway, so there is # some incentive to fix them eventually. CWARNEXTRA?= -Wno-error-tautological-compare -Wno-error-empty-body \ - -Wno-error-parentheses-equality + -Wno-error-parentheses-equality ${NO_WFORMAT} +.endif + +# External compilers may not support our format extensions. Allow them +# to be disabled. WARNING: format checking is disabled in this case. +.if ${MK_FORMAT_EXTENSIONS} == "no" +NO_WFORMAT=-Wno-format +.else +FORMAT_EXTENTIONS= -fformat-extensions .endif # Added: head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS == --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS Wed May 15 13:04:10 2013(r250658) @@ -0,0 +1,5 @@ +.\" $FreeBSD$ +Set to not enable +.Fl fformat-extensions +when compiling the kernel. +Also disables all format checking. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r251296 - in head/sys: net netinet
Author: andre Date: Mon Jun 3 12:55:13 2013 New Revision: 251296 URL: http://svnweb.freebsd.org/changeset/base/251296 Log: Allow drivers to specify a maximum TSO length in bytes if they are limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by:cperciva MFC after:1 week (using spare struct members to preserve ABI) Modified: head/sys/net/if.c head/sys/net/if_var.h head/sys/netinet/tcp_input.c head/sys/netinet/tcp_output.c head/sys/netinet/tcp_subr.c head/sys/netinet/tcp_var.h Modified: head/sys/net/if.c == --- head/sys/net/if.c Mon Jun 3 12:43:09 2013(r251295) +++ head/sys/net/if.c Mon Jun 3 12:55:13 2013(r251296) @@ -74,18 +74,18 @@ #include #if defined(INET) || defined(INET6) -/*XXX*/ #include #include +#include #include +#ifdef INET +#include +#endif /* INET */ #ifdef INET6 #include #include -#endif -#endif -#ifdef INET -#include -#endif +#endif /* INET6 */ +#endif /* INET || INET6 */ #include @@ -653,6 +653,13 @@ if_attach_internal(struct ifnet *ifp, in TAILQ_INSERT_HEAD(&ifp->if_addrhead, ifa, ifa_link); /* Reliably crash if used uninitialized. */ ifp->if_broadcastaddr = NULL; + + /* Initialize to max value. */ + if (ifp->if_hw_tsomax == 0) + ifp->if_hw_tsomax = IP_MAXPACKET; + KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET && + ifp->if_hw_tsomax >= IP_MAXPACKET / 8, + ("%s: tsomax outside of range", __func__)); } #ifdef VIMAGE else { Modified: head/sys/net/if_var.h == --- head/sys/net/if_var.h Mon Jun 3 12:43:09 2013(r251295) +++ head/sys/net/if_var.h Mon Jun 3 12:55:13 2013(r251296) @@ -204,6 +204,11 @@ struct ifnet { u_int if_fib; /* interface FIB */ u_char if_alloctype; /* if_type at time of allocation */ + u_int if_hw_tsomax; /* tso burst length limit, the minmum +* is (IP_MAXPACKET / 8). +* XXXAO: Have to find a better place +* for it eventually. */ + /* * Spare fields are added so that we can modify sensitive data * structures without changing the kernel binary interface, and must Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cMon Jun 3 12:43:09 2013 (r251295) +++ head/sys/netinet/tcp_input.cMon Jun 3 12:55:13 2013 (r251296) @@ -3434,7 +3434,7 @@ tcp_xmit_timer(struct tcpcb *tp, int rtt */ void tcp_mss_update(struct tcpcb *tp, int offer, int mtuoffer, -struct hc_metrics_lite *metricptr, int *mtuflags) +struct hc_metrics_lite *metricptr, struct tcp_ifcap *cap) { int mss = 0; u_long maxmtu = 0; @@ -3461,7 +3461,7 @@ tcp_mss_update(struct tcpcb *tp, int off /* Initialize. */ #ifdef INET6 if (isipv6) { - maxmtu = tcp_maxmtu6(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu6(&inp->inp_inc, cap); tp->t_maxopd = tp->t_maxseg = V_tcp_v6mssdflt; } #endif @@ -3470,7 +3470,7 @@ tcp_mss_update(struct tcpcb *tp, int off #endif #ifdef INET { - maxmtu = tcp_maxmtu(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu(&inp->inp_inc, cap); tp->t_maxopd = tp->t_maxseg = V_tcp_mssdflt; } #endif @@ -3605,11 +3605,12 @@ tcp_mss(struct tcpcb *tp, int offer) struct inpcb *inp; struct socket *so; struct hc_metrics_lite metrics; - int mtuflags = 0; + struct tcp_ifcap cap; KASSERT(tp != NULL, ("%s: tp == NULL", __func__)); - - tcp_mss_update(tp, offer, -1, &metrics, &mtuflags); + + bzero(&cap, sizeof(cap)); + tcp_mss_update(tp, offer, -1, &metrics, &cap); mss = tp->t_maxseg; inp = tp->t_inpcb; @@ -3
svn commit: r251297 - head/sys/dev/xen/netfront
Author: andre Date: Mon Jun 3 13:00:33 2013 New Revision: 251297 URL: http://svnweb.freebsd.org/changeset/base/251297 Log: Specify a maximum TSO length limiting the segment chain to what the Xen host side can handle after defragmentation. This prevents the driver from throwing away too long TSO chains and improves the performance on Amazon AWS instances with 10GigE virtual interfaces to the normally expected throughput. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by:cperciva MFC after:1 week Modified: head/sys/dev/xen/netfront/netfront.c Modified: head/sys/dev/xen/netfront/netfront.c == --- head/sys/dev/xen/netfront/netfront.cMon Jun 3 12:55:13 2013 (r251296) +++ head/sys/dev/xen/netfront/netfront.cMon Jun 3 13:00:33 2013 (r251297) @@ -134,6 +134,7 @@ static const int MODPARM_rx_flip = 0; * to mirror the Linux MAX_SKB_FRAGS constant. */ #defineMAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) +#defineNF_TSO_MAXBURST ((IP_MAXPACKET / PAGE_SIZE) * MCLBYTES) #define RX_COPY_THRESHOLD 256 @@ -2122,6 +2123,7 @@ create_netdev(device_t dev) ifp->if_hwassist = XN_CSUM_FEATURES; ifp->if_capabilities = IFCAP_HWCSUM; + ifp->if_hw_tsomax = NF_TSO_MAXBURST; ether_ifattach(ifp, np->mac); callout_init(&np->xn_stat_ch, CALLOUT_MPSAFE); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r251297 - head/sys/dev/xen/netfront
On 05.06.2013 08:13, Colin Percival wrote: On 06/04/13 22:51, Lawrence Stewart wrote: On 06/03/13 23:00, Andre Oppermann wrote: Modified: head/sys/dev/xen/netfront/netfront.c == --- head/sys/dev/xen/netfront/netfront.cMon Jun 3 12:55:13 2013 (r251296) +++ head/sys/dev/xen/netfront/netfront.cMon Jun 3 13:00:33 2013 (r251297) @@ -134,6 +134,7 @@ static const int MODPARM_rx_flip = 0; * to mirror the Linux MAX_SKB_FRAGS constant. */ #define MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) +#defineNF_TSO_MAXBURST ((IP_MAXPACKET / PAGE_SIZE) * MCLBYTES) For posterity's sake, can you and/or Colin please elaborate on how this value was determined and what it is dependent upon? Could a newer version of Xen remove the need for this reduced limit? The comment above (of which only the last line is quoted in the diff) explains it: * This limit is imposed by the backend driver. We assume here that * we are dealing with a Linux driver domain and have set our limit * to mirror the Linux MAX_SKB_FRAGS constant. This isn't a Xen issue really; rather, it's a Linux Dom0 issue. AFAIK there are no changes in the pipe to fix this in Linux; but this would not be needed with a different Dom0 (e.g., a FreeBSD Dom0, if/when that becomes possible) or if FreeBSD switched to using 4kB mbuf clusters (since at that point we would be matching Linux and be able to fit a maximum-length IP packet into the allowed number of fragments). We do support 4K mbufs and have done so for a long time. The problem is that socket buffer mbuf chains can be any combination of mbuf sizes and m_defrag() so far only collapses to 2K mbuf clusters. The latter can be changed but it is used in a number of places where an explicit 2K assumption may have been made (even if it shouldn't). When all them are checked m_defrag() can be changed to collapse into 4K mbufs and this "hack" removed. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r251894 - in head: lib/libmemstat sys/vm
On 18.06.2013 06:50, Jeff Roberson wrote: Author: jeff Date: Tue Jun 18 04:50:20 2013 New Revision: 251894 URL: http://svnweb.freebsd.org/changeset/base/251894 Log: Refine UMA bucket allocation to reduce space consumption and improve performance. - Always free to the alloc bucket if there is space. This gives LIFO allocation order to improve hot-cache performance. This also allows for zones with a single bucket per-cpu rather than a pair if the entire working set fits in one bucket. - Enable per-cpu caches of buckets. To prevent recursive bucket allocation one bucket zone still has per-cpu caches disabled. - Pick the initial bucket size based on a table driven maximum size per-bucket rather than the number of items per-page. This gives more sane initial sizes. - Only grow the bucket size when we face contention on the zone lock, this causes bucket sizes to grow more slowly. - Adjust the number of items per-bucket to account for the header space. This packs the buckets more efficiently per-page while making them not quite powers of two. - Eliminate the per-zone free bucket list. Always return buckets back to the bucket zone. This ensures that as zones grow into larger bucket sizes they eventually discard the smaller sizes. It persists fewer buckets in the system. The locking is slightly trickier. - Only switch buckets in zalloc, not zfree, this eliminates pathological cases where we ping-pong between two buckets. - Ensure that the thread that fills a new bucket gets to allocate from it to give a better upper bound on allocation time. There used to be a problem with per CPU caches accumulating large amounts of items without freeing back to the global (or socket) pool. Do these updates to UMA change this situation and/or do you have further improvements coming up? -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r251886 - in head: contrib/apr contrib/apr-util contrib/serf contrib/sqlite3 contrib/subversion share/mk usr.bin usr.bin/svn usr.bin/svn/lib usr.bin/svn/lib/libapr usr.bin/svn/lib/liba
On 18.06.2013 18:40, Tijl Coosemans wrote: On 2013-06-18 04:53, Peter Wemm wrote: Author: peter Date: Tue Jun 18 02:53:45 2013 New Revision: 251886 URL: http://svnweb.freebsd.org/changeset/base/251886 Log: Introduce svnlite so that we can check out our source code again. This is actually a fully functional build except: * All internal shared libraries are static linked to make sure there is no interference with ports (and to reduce build time). * It does not have the python/perl/etc plugin or API support. * By default, it installs as "svnlite" rather than "svn". * If WITH_SVN added in make.conf, you get "svn". * If WITHOUT_SVNLITE is in make.conf, this is completely disabled. To be absolutely clear, this is not intended for any use other than checking out freebsd source and committing, like we once did with cvs. It should be usable for small scale local repositories that don't need the python/perl plugin architecture. This ties the repo to the oldest supported release, meaning that years from now we won't be able to use some new subversion feature because an old FreeBSD release doesn't support it. AFAIK there is a checkout-only SVN client available, as in cvsup, but I don't remember the name. I don't find it unreasonable to ask developers to install the port. And for users it seems all they need is something like portsnap for base. Portsnap already distributes ports svn so it shouldn't be too hard to adapt it for base. And the extra layer it adds is very convenient. Apart from a bigger than usual update maybe, portsnap users never even noticed it was switched from cvs to svn at some point. Installing SVN from ports is very painful because of the huge dependency chain it carries, with the largest being Python and Perl IIRC. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r251886 - in head: contrib/apr contrib/apr-util contrib/serf contrib/sqlite3 contrib/subversion share/mk usr.bin usr.bin/svn usr.bin/svn/lib usr.bin/svn/lib/libapr usr.bin/svn/lib/liba
On 18.06.2013 19:04, Alexey Dokuchaev wrote: Being able to checkout the sources is very desirable, but not at the cost of importing another heavy 3rd-party tool, which Subversion is. Just wanted to note that applaud Peter for actually doing something (tm) even though it came as a surprise to many it seems. Now that we're having the discussion we can converge towards the best or least controversial option. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r252209 - in head: share/man/man9 sys/kern sys/sys
On 25.06.2013 20:44, John Baldwin wrote: Author: jhb Date: Tue Jun 25 18:44:15 2013 New Revision: 252209 URL: http://svnweb.freebsd.org/changeset/base/252209 Log: Several improvements to rmlock(9). Many of these are based on patches provided by Isilon. - Add an rm_assert() supporting various lock assertions similar to other locking primitives. Because rmlocks track readers the assertions are always fully accurate unlike rw_assert() and sx_assert(). - Flesh out the lock class methods for rmlocks to support sleeping via condvars and rm_sleep() (but only while holding write locks), rmlock details in 'show lock' in DDB, and the lc_owner method used by dtrace. - Add an internal destroyed cookie so that API functions can assert that an rmlock is not destroyed. - Make use of rm_assert() to add various assertions to the API (e.g. to assert locks are held when an unlock routine is called). - Give RM_SLEEPABLE locks their own lock class and always use the rmlock's own lock_object with WITNESS. - Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping while holding a read lock on an rmlock. Thanks! Would it make sense to move struct rm_queue from struct pcpu itself to using DPCPU as a next step? Submitted by:andre Actually these were only relayed by me and came from Max Laier / Stephan Uphoff. So all fame to them. Obtained from: EMC/Isilon -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r236959 - in head: share/man/man4 sys/netinet
On 12.06.2012 16:02, Michael Tuexen wrote: Author: tuexen Date: Tue Jun 12 14:02:38 2012 New Revision: 236959 URL: http://svn.freebsd.org/changeset/base/236959 Log: Add a IP_RECVTOS socket option to receive for received UDP/IPv4 packets a cmsg of type IP_RECVTOS which contains the TOS byte. Much like IP_RECVTTL does for TTL. This allows to implement a protocol on top of UDP and implementing ECN. You may want to consider to alias IP_RECVTOS with IP_TOS as it is done with IP_SENDSRCADDR+IP_RECVDSTADDR to allow for simpler replying of received UDP packets. That way IP_RECVTOS has the same ip socket option number and it can be used for direct TOS reflection. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241686 - in head/sys: net netgraph netgraph/atm/ccatm netgraph/atm/sscfu netgraph/atm/sscop netgraph/atm/uni netinet netinet6 netipsec
Author: andre Date: Thu Oct 18 13:57:24 2012 New Revision: 241686 URL: http://svn.freebsd.org/changeset/base/241686 Log: Mechanically remove the last stray remains of spl* calls from net*/*. They have been Noop's for a long time now. Modified: head/sys/net/if.c head/sys/net/if_ef.c head/sys/net/if_gre.c head/sys/net/if_spppsubr.c head/sys/net/if_var.h head/sys/net/rtsock.c head/sys/netgraph/atm/ccatm/ng_ccatm.c head/sys/netgraph/atm/sscfu/ng_sscfu.c head/sys/netgraph/atm/sscop/ng_sscop.c head/sys/netgraph/atm/uni/ng_uni.c head/sys/netgraph/ng_eiface.c head/sys/netgraph/ng_ether.c head/sys/netgraph/ng_fec.c head/sys/netgraph/ng_gif.c head/sys/netgraph/ng_ksocket.c head/sys/netgraph/ng_source.c head/sys/netinet/ip_ipsec.c head/sys/netinet6/in6.c head/sys/netinet6/ip6_ipsec.c head/sys/netinet6/nd6.c head/sys/netinet6/nd6_nbr.c head/sys/netinet6/nd6_rtr.c head/sys/netinet6/udp6_usrreq.c head/sys/netipsec/key.c Modified: head/sys/net/if.c == --- head/sys/net/if.c Thu Oct 18 13:46:26 2012(r241685) +++ head/sys/net/if.c Thu Oct 18 13:57:24 2012(r241686) @@ -691,12 +691,9 @@ static void if_attachdomain(void *dummy) { struct ifnet *ifp; - int s; - s = splnet(); TAILQ_FOREACH(ifp, &V_ifnet, if_link) if_attachdomain1(ifp); - splx(s); } SYSINIT(domainifattach, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_SECOND, if_attachdomain, NULL); @@ -705,21 +702,15 @@ static void if_attachdomain1(struct ifnet *ifp) { struct domain *dp; - int s; - - s = splnet(); /* * Since dp->dom_ifattach calls malloc() with M_WAITOK, we * cannot lock ifp->if_afdata initialization, entirely. */ - if (IF_AFDATA_TRYLOCK(ifp) == 0) { - splx(s); + if (IF_AFDATA_TRYLOCK(ifp) == 0) return; - } if (ifp->if_afdata_initialized >= domain_init_status) { IF_AFDATA_UNLOCK(ifp); - splx(s); printf("if_attachdomain called more than once on %s\n", ifp->if_xname); return; @@ -734,8 +725,6 @@ if_attachdomain1(struct ifnet *ifp) ifp->if_afdata[dp->dom_family] = (*dp->dom_ifattach)(ifp); } - - splx(s); } /* @@ -1825,7 +1814,6 @@ link_rtrequest(int cmd, struct rtentry * /* * Mark an interface down and notify protocols of * the transition. - * NOTE: must be called at splnet or eqivalent. */ static void if_unroute(struct ifnet *ifp, int flag, int fam) @@ -1849,7 +1837,6 @@ if_unroute(struct ifnet *ifp, int flag, /* * Mark an interface up and notify protocols of * the transition. - * NOTE: must be called at splnet or eqivalent. */ static void if_route(struct ifnet *ifp, int flag, int fam) @@ -1935,7 +1922,6 @@ do_link_state_change(void *arg, int pend /* * Mark an interface down and notify protocols of * the transition. - * NOTE: must be called at splnet or eqivalent. */ void if_down(struct ifnet *ifp) @@ -1947,7 +1933,6 @@ if_down(struct ifnet *ifp) /* * Mark an interface up and notify protocols of * the transition. - * NOTE: must be called at splnet or eqivalent. */ void if_up(struct ifnet *ifp) @@ -2150,14 +2135,10 @@ ifhwioctl(u_long cmd, struct ifnet *ifp, /* Smart drivers twiddle their own routes */ } else if (ifp->if_flags & IFF_UP && (new_flags & IFF_UP) == 0) { - int s = splimp(); if_down(ifp); - splx(s); } else if (new_flags & IFF_UP && (ifp->if_flags & IFF_UP) == 0) { - int s = splimp(); if_up(ifp); - splx(s); } /* See if permanently promiscuous mode bit is about to flip */ if ((ifp->if_flags ^ new_flags) & IFF_PPROMISC) { @@ -2605,11 +2586,8 @@ ifioctl(struct socket *so, u_long cmd, c if ((oif_flags ^ ifp->if_flags) & IFF_UP) { #ifdef INET6 - if (ifp->if_flags & IFF_UP) { - int s = splimp(); + if (ifp->if_flags & IFF_UP) in6_if_up(ifp); - splx(s); - } #endif } if_rele(ifp); Modified: head/sys/net/if_ef.c == --- head/sys/net/if_ef.cThu Oct 18 13:46:26 2012(r241685) +++ head/sys/net/if_ef.cThu Oct 18 13:57:24 2012(r241686) @@ -151,14 +151,10 @@ static int ef_detach(struct efnet *sc) { struct ifnet *ifp = sc->ef_ifp; - int s; - - s = splimp(); ether_ifdetach(ifp); if_free(ifp); - splx(s); r
svn commit: r241688 - head/sys/net
Author: andre Date: Thu Oct 18 14:08:26 2012 New Revision: 241688 URL: http://svn.freebsd.org/changeset/base/241688 Log: Use LOG_WARNING level in in_attachdomain1() instead of printf(). Submitted by: vijju.singh-at-gmail.com Modified: head/sys/net/if.c Modified: head/sys/net/if.c == --- head/sys/net/if.c Thu Oct 18 13:57:28 2012(r241687) +++ head/sys/net/if.c Thu Oct 18 14:08:26 2012(r241688) @@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp) return; if (ifp->if_afdata_initialized >= domain_init_status) { IF_AFDATA_UNLOCK(ifp); - printf("if_attachdomain called more than once on %s\n", - ifp->if_xname); + log(LOG_WARNING, "if_attachdomain called more than once " + "on %s\n", ifp->if_xname); return; } ifp->if_afdata_initialized = domain_init_status; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241703 - head/sys/kern
Author: andre Date: Thu Oct 18 20:22:17 2012 New Revision: 241703 URL: http://svn.freebsd.org/changeset/base/241703 Log: Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within zero copy specialized sosend_copyin() helper function. Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Thu Oct 18 19:28:31 2012(r241702) +++ head/sys/kern/uipc_socket.c Thu Oct 18 20:22:17 2012(r241703) @@ -890,9 +890,7 @@ sosend_copyin(struct uio *uio, struct mb long len; ssize_t resid; int error; -#ifdef ZERO_COPY_SOCKETS int cow_send; -#endif *retmp = top = NULL; mp = ⊤ @@ -900,11 +898,8 @@ sosend_copyin(struct uio *uio, struct mb resid = uio->uio_resid; error = 0; do { -#ifdef ZERO_COPY_SOCKETS cow_send = 0; -#endif /* ZERO_COPY_SOCKETS */ if (resid >= MINCLSIZE) { -#ifdef ZERO_COPY_SOCKETS if (top == NULL) { m = m_gethdr(M_WAITOK, MT_DATA); m->m_pkthdr.len = 0; @@ -924,15 +919,6 @@ sosend_copyin(struct uio *uio, struct mb m_clget(m, M_WAITOK); len = min(min(MCLBYTES, resid), *space); } -#else /* ZERO_COPY_SOCKETS */ - if (top == NULL) { - m = m_getcl(M_WAIT, MT_DATA, M_PKTHDR); - m->m_pkthdr.len = 0; - m->m_pkthdr.rcvif = NULL; - } else - m = m_getcl(M_WAIT, MT_DATA, 0); - len = min(min(MCLBYTES, resid), *space); -#endif /* ZERO_COPY_SOCKETS */ } else { if (top == NULL) { m = m_gethdr(M_WAIT, MT_DATA); @@ -957,11 +943,9 @@ sosend_copyin(struct uio *uio, struct mb } *space -= len; -#ifdef ZERO_COPY_SOCKETS if (cow_send) error = 0; else -#endif /* ZERO_COPY_SOCKETS */ error = uiomove(mtod(m, void *), (int)len, uio); resid = uio->uio_resid; m->m_len = len; @@ -980,7 +964,7 @@ out: *retmp = top; return (error); } -#endif /*ZERO_COPY_SOCKETS*/ +#endif /* ZERO_COPY_SOCKETS */ #defineSBLOCKWAIT(f) (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT) ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241703 - head/sys/kern
On 18.10.2012 22:22, Andre Oppermann wrote: Author: andre Date: Thu Oct 18 20:22:17 2012 New Revision: 241703 URL: http://svn.freebsd.org/changeset/base/241703 Log: Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within zero copy specialized sosend_copyin() helper function. Note that I'm not saying zero copy should be used or is even more performant than the optimized m_uiotombuf() function. Actually there may be some real bit-rot to zero copy sockets. I've just started looking into it. Note that zero copy isn't entirely true either as it marks the page as COW. So when the userspace application reuses the memory it is copied anyway. Also the overhead of doing the VM magic and mbuf attachment of a VM page isn't free either. To really benefit from it an application has to be written with COW in mind and not reuse the memory that was just written to the socket. For non-aware applications it may be a net performance loss overall. Also I don't like the name zero-copy-socket as it promises too much for those not into socket, mbuf and VM magic. I'd rather call it cow-socket or something like that as it describes much better what is actually happening behind the scenes. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241704 - head/sys/kern
Author: andre Date: Thu Oct 18 21:04:30 2012 New Revision: 241704 URL: http://svn.freebsd.org/changeset/base/241704 Log: Remove unnecessary includes from sosend_copyin() and fix a couple of style issues. Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Thu Oct 18 20:22:17 2012(r241703) +++ head/sys/kern/uipc_socket.c Thu Oct 18 21:04:30 2012(r241704) @@ -860,12 +860,6 @@ struct so_zerocopy_stats{ int found_ifp; }; struct so_zerocopy_stats so_zerocp_stats = {0,0,0}; -#include -#include -#include -#include -#include -#include /* * sosend_copyin() is only used if zero copy sockets are enabled. Otherwise @@ -907,9 +901,9 @@ sosend_copyin(struct uio *uio, struct mb } else m = m_get(M_WAITOK, MT_DATA); if (so_zero_copy_send && - resid>=PAGE_SIZE && - *space>=PAGE_SIZE && - uio->uio_iov->iov_len>=PAGE_SIZE) { + resid >= PAGE_SIZE && + *space >= PAGE_SIZE && + uio->uio_iov->iov_len >= PAGE_SIZE) { so_zerocp_stats.size_ok++; so_zerocp_stats.align_ok++; cow_send = socow_setup(m, uio); @@ -946,7 +940,7 @@ sosend_copyin(struct uio *uio, struct mb if (cow_send) error = 0; else - error = uiomove(mtod(m, void *), (int)len, uio); + error = uiomove(mtod(m, void *), (int)len, uio); resid = uio->uio_resid; m->m_len = len; *mp = m; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241703 - head/sys/kern
On 18.10.2012 23:06, Navdeep Parhar wrote: Hello Andre, A couple of things if you're poking around in this area... I didn't really mean to dive too deep into COW socket writes. On 10/18/12 13:44, Andre Oppermann wrote: On 18.10.2012 22:22, Andre Oppermann wrote: Author: andre Date: Thu Oct 18 20:22:17 2012 New Revision: 241703 URL: http://svn.freebsd.org/changeset/base/241703 Log: Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within zero copy specialized sosend_copyin() helper function. Note that I'm not saying zero copy should be used or is even more performant than the optimized m_uiotombuf() function. Some time back I played around with a modified m_uiotombuf() that was aware of the mbuf_jumbo_16K zone (instead of limiting itself to 4K mbufs). In some cases it performed better than the stock m_uiotombuf. I suspect this change would also help drivers that are unable to deal with long gather lists when doing TSO. But my testing wasn't rigorous enough (I was merely playing around), and the drivers I work with can mostly cope with whatever the kernel throws at them. So nothing came out of it. The jumbo 16K zone is special in that the memory is actually allocated by contigmalloc to get physically contiguous RAM. After some uptime and heavy use this may become difficult to obtain. Also contigmalloc has to hunt for it which may cause quite a bit of overhead. 4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast. To be honest I'm not really happy about > PAGE_SIZE mbufs. They were introduced at a time when DMA engines were more limited and couldn't do S/G DMA on receive. So performance with > PAGE_SIZE mbufs may be a little bit better but when you approach memory fragmentation after some heavy system usage it sucks up to the point where it fails most of the time. PAGE_SIZE mbufs always perform the same with very little deviation. In an ideal scenario I'd like to see 9K and 16K mbufs go away and have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs. Actually there may be some real bit-rot to zero copy sockets. I've just started looking into it. I have a cxgbe(4)-specific true zero-copy implementation. The rx side is in head, the tx side works only for blocking sockets (the "easy" case) and I haven't checked it in anywhere. Take a look at t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c. They're mostly identical to the kernel routines they're based on (read: copy-pasted from). You may find them of some interest if you're working in this area and are thinking of adding zero-copy hooks to the socket implementation. I'm going to have a look at it think about how to generically support DDP either way with our socket buffer layout. Actually that may end up as the golden path. Do away with > PAGE_SIZE mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use DDP for those who need utmost performance (as I said only COW aware applications gain a bit of speed, unaware may end up much worse). -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241724 - head/sys/sys
Author: andre Date: Fri Oct 19 10:04:43 2012 New Revision: 241724 URL: http://svn.freebsd.org/changeset/base/241724 Log: Remove splimp() comment from sysinit table and attribute SI_SUB_PROTO_BEGIN and SI_SUB_PROTO_END to VNET related initializations. MFC after:3 days Modified: head/sys/sys/kernel.h Modified: head/sys/sys/kernel.h == --- head/sys/sys/kernel.h Fri Oct 19 09:41:45 2012(r241723) +++ head/sys/sys/kernel.h Fri Oct 19 10:04:43 2012(r241724) @@ -84,12 +84,6 @@ extern int ticks; * The SI_SUB_SWAP values represent a value used by * the BSD 4.4Lite but not by FreeBSD; it is maintained in dependent * order to support porting. - * - * The SI_SUB_PROTO_BEGIN and SI_SUB_PROTO_END bracket a range of - * initializations to take place at splimp(). This is a historical - * wart that should be removed -- probably running everything at - * splimp() until the first init that doesn't want it is the correct - * fix. They are currently present to ensure historical behavior. */ enum sysinit_sub_id { SI_SUB_DUMMY= 0x000,/* not executed; for linker*/ @@ -147,12 +141,12 @@ enum sysinit_sub_id { SI_SUB_P1003_1B = 0x6E0,/* P1003.1B realtime */ SI_SUB_PSEUDO = 0x700,/* pseudo devices*/ SI_SUB_EXEC = 0x740,/* execve() handlers */ - SI_SUB_PROTO_BEGIN = 0x800,/* XXX: set splimp (kludge)*/ + SI_SUB_PROTO_BEGIN = 0x800,/* VNET initialization */ SI_SUB_PROTO_IF = 0x840,/* interfaces*/ SI_SUB_PROTO_DOMAININIT = 0x860,/* domain registration system */ SI_SUB_PROTO_DOMAIN = 0x880,/* domains (address families?)*/ SI_SUB_PROTO_IFATTACHDOMAIN = 0x881,/* domain dependent data init*/ - SI_SUB_PROTO_END= 0x8ff,/* XXX: set splx (kludge)*/ + SI_SUB_PROTO_END= 0x8ff,/* VNET helper functions */ SI_SUB_KPROF= 0x900,/* kernel profiling*/ SI_SUB_KICK_SCHEDULER = 0xa00,/* start the timeout events*/ SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */ ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241725 - head/sys/net
Author: andre Date: Fri Oct 19 10:07:55 2012 New Revision: 241725 URL: http://svn.freebsd.org/changeset/base/241725 Log: Update to previous r241688 to use __func__ instead of spelled out function name in log(9) message. Suggested by: glebius Modified: head/sys/net/if.c Modified: head/sys/net/if.c == --- head/sys/net/if.c Fri Oct 19 10:04:43 2012(r241724) +++ head/sys/net/if.c Fri Oct 19 10:07:55 2012(r241725) @@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp) return; if (ifp->if_afdata_initialized >= domain_init_status) { IF_AFDATA_UNLOCK(ifp); - log(LOG_WARNING, "if_attachdomain called more than once " - "on %s\n", ifp->if_xname); + log(LOG_WARNING, "%s called more than once on %s\n", + __func__, ifp->if_xname); return; } ifp->if_afdata_initialized = domain_init_status; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241688 - head/sys/net
On 18.10.2012 16:11, Gleb Smirnoff wrote: On Thu, Oct 18, 2012 at 02:08:26PM +, Andre Oppermann wrote: A> Author: andre A> Date: Thu Oct 18 14:08:26 2012 A> New Revision: 241688 A> URL: http://svn.freebsd.org/changeset/base/241688 A> A> Log: A> Use LOG_WARNING level in in_attachdomain1() instead of printf(). A> A> Submitted by: vijju.singh-at-gmail.com A> A> Modified: A> head/sys/net/if.c A> A> Modified: head/sys/net/if.c A> == A> --- head/sys/net/if.c Thu Oct 18 13:57:28 2012(r241687) A> +++ head/sys/net/if.c Thu Oct 18 14:08:26 2012(r241688) A> @@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp) A> return; A> if (ifp->if_afdata_initialized >= domain_init_status) { A> IF_AFDATA_UNLOCK(ifp); A> - printf("if_attachdomain called more than once on %s\n", A> - ifp->if_xname); A> + log(LOG_WARNING, "if_attachdomain called more than once " A> + "on %s\n", ifp->if_xname); A> return; A> } A> ifp->if_afdata_initialized = domain_init_status; It'll be even more perfect if done as "%s called more than once on %s\n", __func__, ifp->if_xname Thanks, done in r241725. And do we need "\n" for log(9)? Yes. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241726 - head/sys/kern
Author: andre Date: Fri Oct 19 10:15:32 2012 New Revision: 241726 URL: http://svn.freebsd.org/changeset/base/241726 Log: Move UMA socket zone initialization from uipc_domain.c to uipc_socket.c into one place next to its other related functions to avoid confusion. Modified: head/sys/kern/uipc_domain.c head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_domain.c == --- head/sys/kern/uipc_domain.c Fri Oct 19 10:07:55 2012(r241725) +++ head/sys/kern/uipc_domain.c Fri Oct 19 10:15:32 2012(r241726) @@ -239,28 +239,11 @@ domain_add(void *data) mtx_unlock(&dom_mtx); } -static void -socket_zone_change(void *tag) -{ - - uma_zone_set_max(socket_zone, maxsockets); -} - /* ARGSUSED*/ static void domaininit(void *dummy) { - /* -* Before we do any setup, make sure to initialize the -* zone allocator we get struct sockets from. -*/ - socket_zone = uma_zcreate("socket", sizeof(struct socket), NULL, NULL, - NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); - uma_zone_set_max(socket_zone, maxsockets); - EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL, - EVENTHANDLER_PRI_FIRST); - if (max_linkhdr < 16) /* XXX */ max_linkhdr = 16; Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Fri Oct 19 10:07:55 2012(r241725) +++ head/sys/kern/uipc_socket.c Fri Oct 19 10:15:32 2012(r241726) @@ -227,6 +227,29 @@ MTX_SYSINIT(so_global_mtx, &so_global_mt SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC"); /* + * Initialize the socket subsystem and set up the socket + * memory allocator. + */ +static void +socket_zone_change(void *tag) +{ + + uma_zone_set_max(socket_zone, maxsockets); +} + +static void +socket_init(void *tag) +{ + +socket_zone = uma_zcreate("socket", sizeof(struct socket), NULL, NULL, +NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); +uma_zone_set_max(socket_zone, maxsockets); +EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL, +EVENTHANDLER_PRI_FIRST); +} +SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL); + +/* * Sysctl to get and set the maximum global sockets limit. Notify protocols * of the change so that they can update their dependent limits as required. */ ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241729 - head/sys/kern
Author: andre Date: Fri Oct 19 12:16:29 2012 New Revision: 241729 URL: http://svn.freebsd.org/changeset/base/241729 Log: Move socket UMA zone initialization functionality together into one place. Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Fri Oct 19 11:01:39 2012(r241728) +++ head/sys/kern/uipc_socket.c Fri Oct 19 12:16:29 2012(r241729) @@ -173,11 +173,8 @@ static struct filterops sowrite_filtops .f_event = filt_sowrite, }; -uma_zone_t socket_zone; so_gen_t so_gencnt; /* generation count for sockets */ -intmaxsockets; - MALLOC_DEFINE(M_SONAME, "soname", "socket name"); MALLOC_DEFINE(M_PCB, "pcb", "protocol control block"); @@ -230,6 +227,9 @@ SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLA * Initialize the socket subsystem and set up the socket * memory allocator. */ +uma_zone_t socket_zone; +intmaxsockets; + static void socket_zone_change(void *tag) { @@ -250,6 +250,19 @@ socket_init(void *tag) SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL); /* + * Initialise maxsockets. This SYSINIT must be run after + * tunable_mbinit(). + */ +static void +init_maxsockets(void *ignored) +{ + + TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets); + maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters)); +} +SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL); + +/* * Sysctl to get and set the maximum global sockets limit. Notify protocols * of the change so that they can update their dependent limits as required. */ @@ -273,25 +286,11 @@ sysctl_maxsockets(SYSCTL_HANDLER_ARGS) } return (error); } - SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT|CTLFLAG_RW, &maxsockets, 0, sysctl_maxsockets, "IU", "Maximum number of sockets avaliable"); /* - * Initialise maxsockets. This SYSINIT must be run after - * tunable_mbinit(). - */ -static void -init_maxsockets(void *ignored) -{ - - TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets); - maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters)); -} -SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL); - -/* * Socket operation routines. These routines are called by the routines in * sys_socket.c or from a system process, and implement the semantics of * socket operations by switching out to the protocol specific routines. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241779 - head/sys/kern
Author: andre Date: Sat Oct 20 10:51:32 2012 New Revision: 241779 URL: http://svn.freebsd.org/changeset/base/241779 Log: Tidy up somaxconn (accept queue limit) and related functions and move it together into one place. Modified: head/sys/kern/uipc_socket.c Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Sat Oct 20 10:34:55 2012(r241778) +++ head/sys/kern/uipc_socket.c Sat Oct 20 10:51:32 2012(r241779) @@ -182,15 +182,37 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co VNET_ASSERT(curvnet != NULL,\ ("%s:%d curvnet is NULL, so=%p", __func__, __LINE__, (so))); +/* + * Limit on the number of connections in the listen queue waiting + * for accept(2). + */ static int somaxconn = SOMAXCONN; -static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS); -/* XXX: we dont have SYSCTL_USHORT */ + +static int +sysctl_somaxconn(SYSCTL_HANDLER_ARGS) +{ + int error; + int val; + + val = somaxconn; + error = sysctl_handle_int(oidp, &val, 0, req); + if (error || !req->newptr ) + return (error); + + if (val < 1 || val > USHRT_MAX) + return (EINVAL); + + somaxconn = val; + return (0); +} SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW, -0, sizeof(int), sysctl_somaxconn, "I", "Maximum pending socket connection " -"queue size"); +0, sizeof(int), sysctl_somaxconn, "I", +"Maximum listen socket pending connection accept queue size"); + static int numopensockets; SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, &numopensockets, 0, "Number of open sockets"); + #ifdef ZERO_COPY_SOCKETS /* These aren't static because they're used in other files. */ int so_zero_copy_send = 1; @@ -3269,24 +3291,6 @@ socheckuid(struct socket *so, uid_t uid) return (0); } -static int -sysctl_somaxconn(SYSCTL_HANDLER_ARGS) -{ - int error; - int val; - - val = somaxconn; - error = sysctl_handle_int(oidp, &val, 0, req); - if (error || !req->newptr ) - return (error); - - if (val < 1 || val > USHRT_MAX) - return (EINVAL); - - somaxconn = val; - return (0); -} - /* * These functions are used by protocols to notify the socket layer (and its * consumers) of state changes in the sockets driven by protocol-side events. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241781 - in head: lib/libc/sys sys/kern
Author: andre Date: Sat Oct 20 12:53:14 2012 New Revision: 241781 URL: http://svn.freebsd.org/changeset/base/241781 Log: Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a output and replace it with a new visible sysctl kern.ipc.acceptqueue of the same functionality. It specifies the maximum length of the accept queue on a listen socket. The old kern.ipc.somaxconn remains available for reading and writing for compatibility reasons so that existing programs, scripts and configurations continue to work. There no plans to ever remove the orginal and now hidden kern.ipc.somaxconn. Modified: head/lib/libc/sys/listen.2 head/sys/kern/uipc_socket.c Modified: head/lib/libc/sys/listen.2 == --- head/lib/libc/sys/listen.2 Sat Oct 20 12:07:48 2012(r241780) +++ head/lib/libc/sys/listen.2 Sat Oct 20 12:53:14 2012(r241781) @@ -28,7 +28,7 @@ .\"From: @(#)listen.2 8.2 (Berkeley) 12/11/93 .\" $FreeBSD$ .\" -.Dd August 29, 2005 +.Dd October 20, 2012 .Dt LISTEN 2 .Os .Sh NAME @@ -102,15 +102,15 @@ of service attacks are no longer necessa The .Xr sysctl 3 MIB variable -.Va kern.ipc.somaxconn +.Va kern.ipc.soacceptqueue specifies a hard limit on .Fa backlog ; if a value greater than -.Va kern.ipc.somaxconn +.Va kern.ipc.soacceptqueue or less than zero is specified, .Fa backlog is silently forced to -.Va kern.ipc.somaxconn . +.Va kern.ipc.soacceptqueue . .Sh INTERACTION WITH ACCEPT FILTERS When accept filtering is used on a socket, a second queue will be used to hold sockets that have connected, but have not yet @@ -168,3 +168,17 @@ at run-time, and to use a negative .Fa backlog to request the maximum allowable value, was introduced in .Fx 2.2 . +The +.Va kern.ipc.somaxconn +.Xr sysctl 3 +has been replaced with +.Va kern.ipc.soacceptqueue +in +.Fx 10.0 +to prevent confusion its actual functionality. +The original +.Xr sysctl 3 +.Va kern.ipc.somaxconn +is still available but hidden from a +.Xr sysctl 3 +-a output so that existing applications and scripts continue to work. Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Sat Oct 20 12:07:48 2012(r241780) +++ head/sys/kern/uipc_socket.c Sat Oct 20 12:53:14 2012(r241781) @@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co /* * Limit on the number of connections in the listen queue waiting * for accept(2). + * NB: The orginal sysctl somaxconn is still available but hidden + * to prevent confusion about the actually purpose of this number. */ static int somaxconn = SOMAXCONN; @@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS) somaxconn = val; return (0); } -SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW, +SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW, 0, sizeof(int), sysctl_somaxconn, "I", "Maximum listen socket pending connection accept queue size"); +SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, +CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP, +0, sizeof(int), sysctl_somaxconn, "I", +"Maximum listen socket pending connection accept queue size (compat)"); static int numopensockets; SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241789 - in head: lib/libc/sys sys/kern
Author: andre Date: Sat Oct 20 19:38:22 2012 New Revision: 241789 URL: http://svn.freebsd.org/changeset/base/241789 Log: Grammar fixes to r241781. Submitted by: alc Modified: head/lib/libc/sys/listen.2 head/sys/kern/uipc_socket.c Modified: head/lib/libc/sys/listen.2 == --- head/lib/libc/sys/listen.2 Sat Oct 20 18:13:20 2012(r241788) +++ head/lib/libc/sys/listen.2 Sat Oct 20 19:38:22 2012(r241789) @@ -175,7 +175,7 @@ has been replaced with .Va kern.ipc.soacceptqueue in .Fx 10.0 -to prevent confusion its actual functionality. +to prevent confusion about its actual functionality. The original .Xr sysctl 3 .Va kern.ipc.somaxconn Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Sat Oct 20 18:13:20 2012(r241788) +++ head/sys/kern/uipc_socket.c Sat Oct 20 19:38:22 2012(r241789) @@ -186,7 +186,7 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co * Limit on the number of connections in the listen queue waiting * for accept(2). * NB: The orginal sysctl somaxconn is still available but hidden - * to prevent confusion about the actually purpose of this number. + * to prevent confusion about the actual purpose of this number. */ static int somaxconn = SOMAXCONN; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241781 - in head: lib/libc/sys sys/kern
On 20.10.2012 19:23, Alan Cox wrote: There are couple minor grammar issues in the text. See below. Thank you. Fixed in r241789. -- Andre Alan On 10/20/2012 07:53, Andre Oppermann wrote: Author: andre Date: Sat Oct 20 12:53:14 2012 New Revision: 241781 URL: http://svn.freebsd.org/changeset/base/241781 Log: Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a output and replace it with a new visible sysctl kern.ipc.acceptqueue of the same functionality. It specifies the maximum length of the accept queue on a listen socket. The old kern.ipc.somaxconn remains available for reading and writing for compatibility reasons so that existing programs, scripts and configurations continue to work. There no plans to ever remove the orginal and now hidden kern.ipc.somaxconn. Modified: head/lib/libc/sys/listen.2 head/sys/kern/uipc_socket.c Modified: head/lib/libc/sys/listen.2 == --- head/lib/libc/sys/listen.2Sat Oct 20 12:07:48 2012(r241780) +++ head/lib/libc/sys/listen.2Sat Oct 20 12:53:14 2012(r241781) @@ -28,7 +28,7 @@ .\"From: @(#)listen.28.2 (Berkeley) 12/11/93 .\" $FreeBSD$ .\" -.Dd August 29, 2005 +.Dd October 20, 2012 .Dt LISTEN 2 .Os .Sh NAME @@ -102,15 +102,15 @@ of service attacks are no longer necessa The .Xr sysctl 3 MIB variable -.Va kern.ipc.somaxconn +.Va kern.ipc.soacceptqueue specifies a hard limit on .Fa backlog ; if a value greater than -.Va kern.ipc.somaxconn +.Va kern.ipc.soacceptqueue or less than zero is specified, .Fa backlog is silently forced to -.Va kern.ipc.somaxconn . +.Va kern.ipc.soacceptqueue . .Sh INTERACTION WITH ACCEPT FILTERS When accept filtering is used on a socket, a second queue will be used to hold sockets that have connected, but have not yet @@ -168,3 +168,17 @@ at run-time, and to use a negative .Fa backlog to request the maximum allowable value, was introduced in .Fx 2.2 . +The +.Va kern.ipc.somaxconn +.Xr sysctl 3 +has been replaced with +.Va kern.ipc.soacceptqueue +in +.Fx 10.0 +to prevent confusion its actual functionality. There is a missing word here: "... confusion about its ..." +The original +.Xr sysctl 3 +.Va kern.ipc.somaxconn +is still available but hidden from a +.Xr sysctl 3 +-a output so that existing applications and scripts continue to work. Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.cSat Oct 20 12:07:48 2012(r241780) +++ head/sys/kern/uipc_socket.cSat Oct 20 12:53:14 2012(r241781) @@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co /* * Limit on the number of connections in the listen queue waiting * for accept(2). + * NB: The orginal sysctl somaxconn is still available but hidden + * to prevent confusion about the actually purpose of this number. "actually" should be "actual". */ static int somaxconn = SOMAXCONN; @@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS) somaxconn = val; return (0); } -SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW, +SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW, 0, sizeof(int), sysctl_somaxconn, "I", "Maximum listen socket pending connection accept queue size"); +SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, +CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP, +0, sizeof(int), sysctl_somaxconn, "I", +"Maximum listen socket pending connection accept queue size (compat)"); static int numopensockets; SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241892 - head/sys/mips/conf
Author: andre Date: Mon Oct 22 15:04:23 2012 New Revision: 241892 URL: http://svn.freebsd.org/changeset/base/241892 Log: Remove ZERO_COPY_SOCKETS from kernel configuration as the current COW based approach is not safe and should not be used in production. Modified: head/sys/mips/conf/RT305X Modified: head/sys/mips/conf/RT305X == --- head/sys/mips/conf/RT305X Mon Oct 22 14:48:14 2012(r241891) +++ head/sys/mips/conf/RT305X Mon Oct 22 15:04:23 2012(r241892) @@ -86,7 +86,6 @@ options SCSI_NO_OP_STRINGS optionsRWLOCK_NOINLINE optionsSX_NOINLINE optionsNO_SWAPPING -optionsZERO_COPY_SOCKETS options MROUTING# Multicast routing optionsIPFIREWALL_DEFAULT_TO_ACCEPT ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241923 - in head/sys: netinet netipsec
On 23.10.2012 10:33, Gleb Smirnoff wrote: Author: glebius Date: Tue Oct 23 08:33:13 2012 New Revision: 241923 URL: http://svn.freebsd.org/changeset/base/241923 Log: Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Yay! More Mammoth shit getting washed away! ;) Please add an entry to UPDATING as the convention of of ip_len subtraction has been there since forever. That makes it easier to discover for third parties writing code. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241931 - in head/sys: conf kern
Author: andre Date: Tue Oct 23 14:19:44 2012 New Revision: 241931 URL: http://svn.freebsd.org/changeset/base/241931 Log: Replace the ill-named ZERO_COPY_SOCKET kernel option with two more appropriate named kernel options for the very distinct send and receive path. "options SOCKET_SEND_COW" enables VM page copy-on-write based sending of data on an outbound socket. NB: The COW based send mechanism is not safe and may result in kernel crashes. "options SOCKET_RECV_PFLIP" enables VM kernel/userspace page flipping for special disposable pages attached as external storage to mbufs. Only the naming of the kernel options is changed and their corresponding #ifdef sections are adjusted. No functionality is added or removed. Discussed with: alc (mechanism and limitations of send side COW) Modified: head/sys/conf/NOTES head/sys/conf/options head/sys/kern/subr_uio.c head/sys/kern/uipc_socket.c Modified: head/sys/conf/NOTES == --- head/sys/conf/NOTES Tue Oct 23 12:39:17 2012(r241930) +++ head/sys/conf/NOTES Tue Oct 23 14:19:44 2012(r241931) @@ -964,12 +964,20 @@ options TCP_SIGNATURE #include support # a smooth scheduling of the traffic. optionsDUMMYNET -# Zero copy sockets support. This enables "zero copy" for sending and -# receiving data via a socket. The send side works for any type of NIC, -# the receive side only works for NICs that support MTUs greater than the -# page size of your architecture and that support header splitting. See -# zero_copy(9) for more details. -optionsZERO_COPY_SOCKETS +# "Zero copy" sockets support is split into the send and receive path +# which operate very differently. +# For the send path the VM page with the data is wired into the kernel +# and marked as COW (copy-on-write). If the application touches the +# data while it is still in the send socket buffer the page is copied +# and divorced from its kernel wiring (no longer zero copy). +# The receive side requires explicit NIC driver support to create +# disposable pages which are flipped from kernel to user-space VM. +# See zero_copy(9) for more details. +# XXX: The COW based send mechanism is not safe and may result in +# kernel crashes. +# XXX: None of the current NIC drivers support disposeable pages. +optionsSOCKET_SEND_COW +optionsSOCKET_RECV_PFLIP # # FILESYSTEM OPTIONS Modified: head/sys/conf/options == --- head/sys/conf/options Tue Oct 23 12:39:17 2012(r241930) +++ head/sys/conf/options Tue Oct 23 14:19:44 2012(r241931) @@ -520,7 +520,8 @@ NGATM_CCATM opt_netgraph.h # DRM options DRM_DEBUG opt_drm.h -ZERO_COPY_SOCKETS opt_zero.h +SOCKET_SEND_COWopt_zero.h +SOCKET_RECV_PFLIP opt_zero.h TI_SF_BUF_JUMBOopt_ti.h TI_JUMBO_HDRSPLIT opt_ti.h BCE_JUMBO_HDRSPLIT opt_bce.h Modified: head/sys/kern/subr_uio.c == --- head/sys/kern/subr_uio.cTue Oct 23 12:39:17 2012(r241930) +++ head/sys/kern/subr_uio.cTue Oct 23 14:19:44 2012(r241931) @@ -57,7 +57,7 @@ __FBSDID("$FreeBSD$"); #include #include #include -#ifdef ZERO_COPY_SOCKETS +#ifdef SOCKET_SEND_COW #include #endif @@ -66,7 +66,7 @@ SYSCTL_INT(_kern, KERN_IOV_MAX, iov_max, static int uiomove_faultflag(void *cp, int n, struct uio *uio, int nofault); -#ifdef ZERO_COPY_SOCKETS +#ifdef SOCKET_SEND_COW /* Declared in uipc_socket.c */ extern int so_zero_copy_receive; @@ -128,7 +128,7 @@ retry: vm_map_lookup_done(map, entry); return(KERN_SUCCESS); } -#endif /* ZERO_COPY_SOCKETS */ +#endif /* SOCKET_SEND_COW */ int copyin_nofault(const void *udaddr, void *kaddr, size_t len) @@ -261,7 +261,7 @@ uiomove_frombuf(void *buf, int buflen, s return (uiomove((char *)buf + offset, n, uio)); } -#ifdef ZERO_COPY_SOCKETS +#ifdef SOCKET_RECV_PFLIP /* * Experimental support for zero-copy I/O */ @@ -356,7 +356,7 @@ uiomoveco(void *cp, int n, struct uio *u } return (0); } -#endif /* ZERO_COPY_SOCKETS */ +#endif /* SOCKET_RECV_PFLIP */ /* * Give next character to user as result of read. Modified: head/sys/kern/uipc_socket.c == --- head/sys/kern/uipc_socket.c Tue Oct 23 12:39:17 2012(r241930) +++ head/sys/kern/uipc_socket.c Tue Oct 23 14:19:44 2012(r241931) @@ -219,17 +219,20 @@ static int numopensockets; SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, &numopensockets, 0, "Number of open sockets"); -#ifdef ZERO_COPY_SOCKETS -/* These aren't static because th
svn commit: r241932 - head/share/man/man9
Author: andre Date: Tue Oct 23 14:25:37 2012 New Revision: 241932 URL: http://svn.freebsd.org/changeset/base/241932 Log: Update zero_copy(9) man page to note the renamed kernel options and to warn about unsafeness of COW based sends. Modified: head/share/man/man9/zero_copy.9 Modified: head/share/man/man9/zero_copy.9 == --- head/share/man/man9/zero_copy.9 Tue Oct 23 14:19:44 2012 (r241931) +++ head/share/man/man9/zero_copy.9 Tue Oct 23 14:25:37 2012 (r241932) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd December 5, 2004 +.Dd October 23, 2012 .Dt ZERO_COPY 9 .Os .Sh NAME @@ -33,7 +33,8 @@ .Nm zero_copy_sockets .Nd "zero copy sockets code" .Sh SYNOPSIS -.Cd "options ZERO_COPY_SOCKETS" +.Cd "options SOCKET_SEND_COW" +.Cd "options SOCKET_RECV_PFLIP" .Sh DESCRIPTION The .Fx @@ -155,6 +156,8 @@ variables respectively. .Xr sendfile 2 , .Xr socket 2 , .Xr ti 4 +.Sh BUGS +The COW based send mechanism is not safe and may result in kernel crashes. .Sh HISTORY The zero copy sockets code first appeared in .Fx 5.0 , ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241931 - in head/sys: conf kern
On 23.10.2012 16:42, Gleb Smirnoff wrote: On Tue, Oct 23, 2012 at 02:19:45PM +, Andre Oppermann wrote: A> Author: andre A> Date: Tue Oct 23 14:19:44 2012 A> New Revision: 241931 A> URL: http://svn.freebsd.org/changeset/base/241931 A> A> Log: A> Replace the ill-named ZERO_COPY_SOCKET kernel option with two A> more appropriate named kernel options for the very distinct A> send and receive path. A> A> "options SOCKET_SEND_COW" enables VM page copy-on-write based A> sending of data on an outbound socket. A> A> NB: The COW based send mechanism is not safe and may result A> in kernel crashes. A> A> "options SOCKET_RECV_PFLIP" enables VM kernel/userspace page A> flipping for special disposable pages attached as external A> storage to mbufs. A> A> Only the naming of the kernel options is changed and their A> corresponding #ifdef sections are adjusted. No functionality A> is added or removed. A> A> Discussed with: alc (mechanism and limitations of send side COW) Users may call this a pointless POLA violation. IMO, the old kernel option that we had for years, more than a decade, should remain and just imply two new kernel options. There shouldn't be any users. Zero copy send is broken and responsible for random kernel crashes. Zero copy receive isn't supported by any modern driver. Both are useless to dangerous. The main problem with ZERO_COPY_SOCKETS was that it sounded great and who wouldn't want to have zero copy sockets? Unfortunately it doesn't work that way. According to alc@ even if zero copy send would work it wouldn't be faster due to page based COW setup being a very expensive operation. Eventually he want's page-based COW to go away. For zero copy send we're trying to come up with a sendfile-like approach where the page is simply wired into kernel space. The application then is not allowed to touch it until the socket buffer has released it again. The main issue here is how to provide feedback to the application when it is safe for reuse. For zero copy receive I've been contacted by np@ to find a way to combine DDP into the socket buffer layer. Trying to work something out that isn't too horrible. A generic approach would hinge on page sized mbufs though. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241931 - in head/sys: conf kern
On 23.10.2012 17:11, David Chisnall wrote: On 23 Oct 2012, at 16:05, Andre Oppermann wrote: For zero copy send we're trying to come up with a sendfile-like approach where the page is simply wired into kernel space. The application then is not allowed to touch it until the socket buffer has released it again. The main issue here is how to provide feedback to the application when it is safe for reuse. It's been a few years since I used it, but I thought that aio_write() already provided this. The application may not modify the contents of the memory pointed to by aio_buf until after it has received notification that the write has finished. This happens either via a signal directly, a signal polled by kqueue, or a call to aio_return(). Indeed, that's one of the ways being explored. It requires the explicit cooperation of the application. I don't think there is any way around that. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241955 - head
Author: andre Date: Tue Oct 23 16:33:43 2012 New Revision: 241955 URL: http://svn.freebsd.org/changeset/base/241955 Log: Note the removal of the ZERO_COPY_SOCKETS kernel option in r241931 and provide a proper explanation. Modified: head/UPDATING Modified: head/UPDATING == --- head/UPDATING Tue Oct 23 16:12:17 2012(r241954) +++ head/UPDATING Tue Oct 23 16:33:43 2012(r241955) @@ -25,6 +25,17 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10 "ln -s 'abort:false,junk:false' /etc/malloc.conf".) 20121023: + The ZERO_COPY_SOCKET kernel option has been removed and + split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP. + NB: SOCKET_SEND_COW uses the VM page based copy-on-write + mechanism which is not safe and may result in kernel crashes. + NB: The SOCKET_RECV_PFLIP mechanism is useless as no current + driver supports disposeable external page sized mbuf storage. + Proper replacements for both zero-copy mechanisms are under + consideration and will eventually lead to complete removal + of the two kernel options. + +20121023: The IPv4 network stack has been converted to network byte order. The following modules need to be recompiled together with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4), ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241931 - in head/sys: conf kern
On 23.10.2012 17:21, Bryan Drewery wrote: On 10/23/2012 10:05 AM, Andre Oppermann wrote: There shouldn't be any users. Zero copy send is broken and responsible for random kernel crashes. Zero copy receive isn't supported by any modern driver. Both are useless to dangerous. I enabled this a few weeks ago, not knowing it was useless/dangerous. Perhaps an entry in UPDATING to note that this has been renamed and that it may not actually be useful? Good idea. Will do. Also, zero_copy(9) needs updating, as it references ZERO_COPY_SOCKETS. Already done in r241932. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r241931 - in head/sys: conf kern
On 23.10.2012 18:05, Gleb Smirnoff wrote: On Tue, Oct 23, 2012 at 05:05:48PM +0200, Andre Oppermann wrote: A> There shouldn't be any users. Zero copy send is broken and A> responsible for random kernel crashes. Zero copy receive isn't A> supported by any modern driver. Both are useless to dangerous. A> A> The main problem with ZERO_COPY_SOCKETS was that it sounded great A> and who wouldn't want to have zero copy sockets? Unfortunately A> it doesn't work that way. Okay, it appeared that there are users, even on current@ mailing list during couple of hours of exposition. Can we keep the old option as compatibility? No. They are not users. They simply fell for the promise of "zero copy" which it isn't. It doesn't do what the "users" believe it does. It's useless for receive and dangerous for send. I have updated NOTES and forwarded it to -current. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r241971 - head/sys/conf
Author: andre Date: Tue Oct 23 23:13:44 2012 New Revision: 241971 URL: http://svn.freebsd.org/changeset/base/241971 Log: Change the dependency of kern/uipc_cow.c from zero_copy_sockets to socket_send_cow. Missed in r241931. Submitted by: pluknet Modified: head/sys/conf/files Modified: head/sys/conf/files == --- head/sys/conf/files Tue Oct 23 22:58:25 2012(r241970) +++ head/sys/conf/files Tue Oct 23 23:13:44 2012(r241971) @@ -2691,7 +2691,7 @@ kern/tty_pts.cstandard kern/tty_tty.c standard kern/tty_ttydisc.c standard kern/uipc_accf.c optional inet -kern/uipc_cow.coptional zero_copy_sockets +kern/uipc_cow.coptional socket_send_cow kern/uipc_debug.c optional ddb kern/uipc_domain.c standard kern/uipc_mbuf.c standard ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd wrote: On 24 October 2012 11:36, Jim Harris wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:49, Jim Harris wrote: On Wed, Oct 24, 2012 at 12:16 PM, Andre Oppermann wrote: See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I've run my same tests with #3 as you describe, and I did see further noticeable improvement. I had a difficult time though quantifying the effect it would have on all of the different architectures. Putting it in ULE's tdq gained 60-70% of the overall benefit, and was well contained. I just experimented with different specifications of alignment and couldn't get the globals aligned at all. This seems to be because of the linker not understanding or not getting passed the alignment information when linking the kernel. I agree that sprinkling all over the place isn't pretty. But focused investigations into specific locks (spin mutexes, default mutexes, whatever) may find a few key additional ones that would benefit. I started down this path with the sleepq and turnstile locks, but none of those specifically showed noticeable improvement (at least in the tests I was running). There's still some additional ones I want to look at, but haven't had the time yet. This runs the very great risk of optimizing for today's available architectures and then needs rejiggling every five years. Just as you've noticed the issue with 128B alignment from the Netburst days. We never know how the next micro-architecture will behave. Micro optimizing each individual invocation of common building blocks is the wrong path to go. I'd very much prefer the alignment *and* padding control to be done in one place for all of them, either through a magic macro or compiler __attribute__(whatever). -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:06, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin wrote: On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? No - that doesn't pad it. I believe that only works if it's global, i.e. not part of a data structure. As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:30, Alexander Motin wrote: On 24.10.2012 22:16, Andre Oppermann wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd wrote: On 24 October 2012 11:36, Jim Harris wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I'm sorry, could you hint me with some theory? I think I can agree that cache line sharing can be a problem in case of spin locks -- waiting thread will constantly try to access page modified by other CPU, that I guess will cause cache line writes to the RAM. But why is it so bad to share lock with respective data in case of non-spin locks? Won't benefits from free regular prefetch of the right data while grabbing lock compensate penalties from relatively rare collisions? Cliff Click describes it in detail: http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends For a classic mutex it likely doesn't make much difference since the cache line is exclusive anyway while the lock is held. On LL/SC systems there may be cache line dirtying on a failed locking attempt. For spin mutexes it hurts badly as you noted. Especially on RW mutexes it hurts because a read lock dirties the cache line for all other CPU's. Here the RW mutex should be on its own cache line in all cases. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 22:29, Attilio Rao wrote: On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann wrote: On 24.10.2012 21:06, Attilio Rao wrote: As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. Yes but the concept being that if you use __aligned() properly (when defining a struct) the object will be correctly sized, so you will get padding automatically. Yes. With __aligned() the start of the element/structure should begin on an address evenly dividable by the align value *and* it should pad out any remaining space up to the next evenly dividable address. The problem we have is that is apparently doesn't work correctly within gcc when creating structs nor within the linker when placing such supposedly aligned structs in the .bss section (at least the padding is missing). It seems to come down to either a) fixing gcc+ld; or b) hacking around it by magically padding the structs that require it. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 22:55, Andre Oppermann wrote: On 24.10.2012 22:29, Attilio Rao wrote: On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann wrote: On 24.10.2012 21:06, Attilio Rao wrote: As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. Yes but the concept being that if you use __aligned() properly (when defining a struct) the object will be correctly sized, so you will get padding automatically. Yes. With __aligned() the start of the element/structure should begin on an address evenly dividable by the align value *and* it should pad out any remaining space up to the next evenly dividable address. The problem we have is that is apparently doesn't work correctly within gcc when creating structs nor within the linker when placing such supposedly aligned structs in the .bss section (at least the padding is missing). I spoke too soon. Attilio is completely right in his assessment. It does work when done on the struct definition: struct mtx { ... } __aligned(CACHE_LINE_SIZE); /* works including .bss alignment & padding */ When creating a struct (in globals at least) it doesn't work: struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx; /* doesn't work */ It seems to come down to either a) fixing gcc+ld; or b) hacking around it by magically padding the structs that require it. The question now becomes of whether we can (should?) make the latter case above work or find another workaround. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw
On 25.10.2012 11:39, Andrey V. Elsukov wrote: Author: ae Date: Thu Oct 25 09:39:14 2012 New Revision: 242079 URL: http://svn.freebsd.org/changeset/base/242079 Log: Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by:Yandex LLC Discussed with: net@ MFC after: 2 weeks I still don't agree with naming the sysctl net.pfil.forward. This type of forwarding is a property of IPv4 and IPv6 and thus should be put there. Pfil hooking can be on layer 2, 2-bridging, 3 and who knows where else in the future. Forwarding works only for IPv46. You haven't even replied to my comment on net@. Please change the sysctl location and name to its appropriate place. Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_ FORWARD enabled the sysctl at the same time to keep kernel configs within 9-stable working. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242014 - head/sys/kern
On 25.10.2012 05:49, Bruce Evans wrote: On Wed, 24 Oct 2012, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann wrote: ... Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. Unless it is sprinkled in struct declarations. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. This doesn't work either with fully dynamic (auto) allocations. Stack alignment is generally broken (limited, and pessimized for both space and time) in gcc (it works better in clang). On amd64, it is limited by the default of -mpreferred-stack-boundary=4. Since 2**4 is smaller than the cache line size and stack alignments larger than it are broken in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally, 16/CACHE_LINE_SIZE of the time. On i386, we reduce the space/time pessimizations a little by overriding the default to -mpreferred-stack-boundary=2. 2**2 is even smaller than the cache line size. (The pessimizations are for both space and time, since time and code space is wasted for the code to keep the stack aligned, and cache space and thus also time are wasted for padding. Most functions don't benefit from more than sizeof(register_t) alignment.) I'm not aware of stack allocated mutexes anywhere in the kernel. Even if there is a case it's very special and unique. I've verified that __aligned(CACHE_LINE_SIZE) on the definition of struct mtx itself (in sys/_mutex.h) correctly aligns and pads the global .bss resident mutexes for 64B and 128B cache line sizes. Dynamic allocations via malloc() get whatever alignment malloc() gives. This is only required to be 4 or 8 or 16 or so (the maximum for a C object declared in conforming C (no __align()), but malloc() usually gives more. If it gives CACHE_LINE_SIZE, that is wasteful for most small allocations. Stand-alone mutexes are normally not malloc'ed. They're always embedded into some larger structure they protect. __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at least on i386. In gcc-3.3.3, it assumes that the stack is the default 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to say otherwise, and just subtracts from the stack pointer. In gcc-4.2.1, it does the necessary andl of the stack pointer, but only 16-byte alignment. It is another bug that there sre no extensions of malloc() or alloca(). Since malloc() is in the library and may give CACHE_LINE_SIZE but __builtin_alloca() is in the compiler and only gives 16, these functions are not even as compatible as they should be. I don't know of any mutexes allocated on the stack, but there are stack frames with mcontexts in them that need special alignment so they cause problems on i386. They can't just be put on the stack due to the above bugs. They are laboriously allocated using malloc(). Since they are a quite large, 1 mcontext barely fits on the kernel stack, so kib didn't like my alloca() method for allocating them. You lost me here. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw
On 25.10.2012 18:25, Andrey V. Elsukov wrote: On 25.10.2012 19:54, Andre Oppermann wrote: I still don't agree with naming the sysctl net.pfil.forward. This type of forwarding is a property of IPv4 and IPv6 and thus should be put there. Pfil hooking can be on layer 2, 2-bridging, 3 and who knows where else in the future. Forwarding works only for IPv46. You haven't even replied to my comment on net@. Please change the sysctl location and name to its appropriate place. Hi Andre, There were two replies related to this subject, you did not replied to them and i thought that you became agree. I replied to your reply to mine. Other than that I didn't find anything else from you. So, if not, what you think about the name net.pfil.ipforward? net.inet.ip.pfil_forward net.inet6.ip6.pfil_forward or something like that. If you can show with your performance profiling that the sysctl isn't even necessary, you could leave it completely away and have pfil_forward enabled permanently. That would be even better for everybody. Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_ FORWARD enabled the sysctl at the same time to keep kernel configs within 9-stable working. Yes, it will work like that. Excellent. Thank you. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw
On 26.10.2012 13:26, Gleb Smirnoff wrote: On Thu, Oct 25, 2012 at 10:29:51PM +0200, Andre Oppermann wrote: A> On 25.10.2012 18:25, Andrey V. Elsukov wrote: A> > On 25.10.2012 19:54, Andre Oppermann wrote: A> >> I still don't agree with naming the sysctl net.pfil.forward. This A> >> type of forwarding is a property of IPv4 and IPv6 and thus should A> >> be put there. Pfil hooking can be on layer 2, 2-bridging, 3 and A> >> who knows where else in the future. Forwarding works only for IPv46. A> >> A> >> You haven't even replied to my comment on net@. Please change the A> >> sysctl location and name to its appropriate place. A> > A> > Hi Andre, A> > A> > There were two replies related to this subject, you did not replied to A> > them and i thought that you became agree. A> A> I replied to your reply to mine. Other than that I didn't find A> anything else from you. A> A> > So, if not, what you think about the name net.pfil.ipforward? A> A> net.inet.ip.pfil_forward A> net.inet6.ip6.pfil_forward A> A> or something like that. A> A> If you can show with your performance profiling that the sysctl A> isn't even necessary, you could leave it completely away and have A> pfil_forward enabled permanently. That would be even better for A> everybody. I'd prefer to have the sysctl. Benchmarking will definitely show no regression, because in default case packets are tagless. But if packets would carry 1 or 2 tags each, which don't actually belong to PACKET_TAG_IPFORWARD, then processing would be pessimized. With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5] mbuf flags. The same can be done with M_IPFORWARD. The ipfw code then will not only add the m_tag but also set M_IPFORWARD flag. That way no sysctl is required and the feature is always available. The overlay definition is in ip_var.h. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw
On 26.10.2012 14:29, Andrey V. Elsukov wrote: On 26.10.2012 15:43, Andre Oppermann wrote: A> If you can show with your performance profiling that the sysctl A> isn't even necessary, you could leave it completely away and have A> pfil_forward enabled permanently. That would be even better for A> everybody. I'd prefer to have the sysctl. Benchmarking will definitely show no regression, because in default case packets are tagless. But if packets would carry 1 or 2 tags each, which don't actually belong to PACKET_TAG_IPFORWARD, then processing would be pessimized. With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5] mbuf flags. The same can be done with M_IPFORWARD. The ipfw code then will not only add the m_tag but also set M_IPFORWARD flag. That way no sysctl is required and the feature is always available. The overlay definition is in ip_var.h. It seems we have only one bit in the m_flags that can be used, so, maybe we left it to some things that can appear in the future? That's what the M_PROTO flags are for: #define M_IPFW_FORWARD M_PROTO2/* ip forwarding */ of course you have to do the same for ip6. The M_PROTO[1-5] flags are only valid within a protocol layer. For example they get cleared in ip_output() before the packet is handed to layer 2. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw
On 26.10.2012 15:24, Andre Oppermann wrote: On 26.10.2012 14:29, Andrey V. Elsukov wrote: On 26.10.2012 15:43, Andre Oppermann wrote: A> If you can show with your performance profiling that the sysctl A> isn't even necessary, you could leave it completely away and have A> pfil_forward enabled permanently. That would be even better for A> everybody. I'd prefer to have the sysctl. Benchmarking will definitely show no regression, because in default case packets are tagless. But if packets would carry 1 or 2 tags each, which don't actually belong to PACKET_TAG_IPFORWARD, then processing would be pessimized. With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5] mbuf flags. The same can be done with M_IPFORWARD. The ipfw code then will not only add the m_tag but also set M_IPFORWARD flag. That way no sysctl is required and the feature is always available. The overlay definition is in ip_var.h. It seems we have only one bit in the m_flags that can be used, so, maybe we left it to some things that can appear in the future? That's what the M_PROTO flags are for: #defineM_IPFW_FORWARDM_PROTO2/* ip forwarding */ Actually looking at it technically this isn't forwarding but specifying a different nexthop. Hence the #define and description should be more like #define M_IP_NEXTHOPM_PROTO2/* explicit ip nexthop */ Of course the userspace ipfw feature naming and usage doesn't change. But within the kernel it's really nexthop manipulation within the forwarding path. -- Andre of course you have to do the same for ip6. The M_PROTO[1-5] flags are only valid within a protocol layer. For example they get cleared in ip_output() before the packet is handed to layer 2. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242151 - in head/sys: vm xen/evtchn
Author: andre Date: Fri Oct 26 17:31:35 2012 New Revision: 242151 URL: http://svn.freebsd.org/changeset/base/242151 Log: Move the corresponding MTX_SYSINIT() next to their struct mtx declaration to make their relationship more obvious as done with the other such mutexs. Modified: head/sys/vm/vm_glue.c head/sys/xen/evtchn/evtchn.c Modified: head/sys/vm/vm_glue.c == --- head/sys/vm/vm_glue.c Fri Oct 26 17:02:50 2012(r242150) +++ head/sys/vm/vm_glue.c Fri Oct 26 17:31:35 2012(r242151) @@ -307,6 +307,8 @@ struct kstack_cache_entry *kstack_cache; static int kstack_cache_size = 128; static int kstacks; static struct mtx kstack_cache_mtx; +MTX_SYSINIT(kstack_cache, &kstack_cache_mtx, "kstkch", MTX_DEF); + SYSCTL_INT(_vm, OID_AUTO, kstack_cache_size, CTLFLAG_RW, &kstack_cache_size, 0, ""); SYSCTL_INT(_vm, OID_AUTO, kstacks, CTLFLAG_RD, &kstacks, 0, @@ -486,7 +488,6 @@ kstack_cache_init(void *nulll) EVENTHANDLER_PRI_ANY); } -MTX_SYSINIT(kstack_cache, &kstack_cache_mtx, "kstkch", MTX_DEF); SYSINIT(vm_kstacks, SI_SUB_KTHREAD_INIT, SI_ORDER_ANY, kstack_cache_init, NULL); #ifndef NO_SWAPPING Modified: head/sys/xen/evtchn/evtchn.c == --- head/sys/xen/evtchn/evtchn.cFri Oct 26 17:02:50 2012 (r242150) +++ head/sys/xen/evtchn/evtchn.cFri Oct 26 17:31:35 2012 (r242151) @@ -44,7 +44,15 @@ static inline unsigned long __ffs(unsign return word; } +/* + * irq_mapping_update_lock: in order to allow an interrupt to occur in a critical + * section, to set pcpu->ipending (etc...) properly, we + * must be able to get the icu lock, so it can't be + * under witness. + */ static struct mtx irq_mapping_update_lock; +MTX_SYSINIT(irq_mapping_update_lock, &irq_mapping_update_lock, "xp", MTX_SPIN); + static struct xenpic *xp; struct xenpic_intsrc { struct intsrc xp_intsrc; @@ -1130,11 +1138,4 @@ evtchn_init(void *dummy __unused) } SYSINIT(evtchn_init, SI_SUB_INTR, SI_ORDER_MIDDLE, evtchn_init, NULL); -/* - * irq_mapping_update_lock: in order to allow an interrupt to occur in a critical - * section, to set pcpu->ipending (etc...) properly, we - * must be able to get the icu lock, so it can't be - * under witness. - */ -MTX_SYSINIT(irq_mapping_update_lock, &irq_mapping_update_lock, "xp", MTX_SPIN); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf
On 26.10.2012 23:06, Gleb Smirnoff wrote: Author: glebius Date: Fri Oct 26 21:06:33 2012 New Revision: 242161 URL: http://svn.freebsd.org/changeset/base/242161 Log: o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). I'm not getting your description here? Why work around a bug in a driver in ip_fragment() when we can fix the bug in the driver? o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Good. :) Submitted by:Sebastian Kuzminsky -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242249 - head/sys/netinet
Author: andre Date: Sun Oct 28 17:16:09 2012 New Revision: 242249 URL: http://svn.freebsd.org/changeset/base/242249 Log: Adjust the initial default CWND upon connection establishment to the new and increased values specified by RFC5681 Section 3.1. The even larger initial CWND per RFC3390, if enabled, is not affected. MFC after:2 weeks Modified: head/sys/netinet/tcp_input.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 28 17:06:50 2012 (r242248) +++ head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012 (r242249) @@ -351,8 +351,15 @@ cc_conn_init(struct tcpcb *tp) if (V_tcp_do_rfc3390) tp->snd_cwnd = min(4 * tp->t_maxseg, max(2 * tp->t_maxseg, 4380)); - else - tp->snd_cwnd = tp->t_maxseg; + else { + /* Per RFC5681 Section 3.1 */ + if (tp->t_maxseg > 2190) + tp->snd_cwnd = 2 * tp->t_maxseg; + else if (tp->t_maxseg > 1095) + tp->snd_cwnd = 3 * tp->t_maxseg; + else + tp->snd_cwnd = 4 * tp->t_maxseg; + } if (CC_ALGO(tp)->conn_init != NULL) CC_ALGO(tp)->conn_init(tp->ccv); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242250 - head/sys/netinet
Author: andre Date: Sun Oct 28 17:25:08 2012 New Revision: 242250 URL: http://svn.freebsd.org/changeset/base/242250 Log: When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after:2 weeks Modified: head/sys/netinet/tcp_input.c head/sys/netinet/tcp_syncache.c head/sys/netinet/tcp_timer.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012 (r242249) +++ head/sys/netinet/tcp_input.cSun Oct 28 17:25:08 2012 (r242250) @@ -345,10 +345,16 @@ cc_conn_init(struct tcpcb *tp) /* * Set the initial slow-start flight size. * -* RFC3390 says only do this if SYN or SYN/ACK didn't got lost. -* XXX: We currently check only in syncache_socket for that. -*/ - if (V_tcp_do_rfc3390) +* RFC5681 Section 3.1 specifies the default conservative values. +* RFC3390 specifies slightly more aggressive values. +* +* If a SYN or SYN/ACK was lost and retransmitted, we have to +* reduce the initial CWND to one segment as congestion is likely +* requiring us to be cautious. +*/ + if (tp->snd_cwnd == 1) + tp->snd_cwnd = tp->t_maxseg;/* SYN(-ACK) lost */ + else if (V_tcp_do_rfc3390) tp->snd_cwnd = min(4 * tp->t_maxseg, max(2 * tp->t_maxseg, 4380)); else { Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:16:09 2012 (r242249) +++ head/sys/netinet/tcp_syncache.c Sun Oct 28 17:25:08 2012 (r242250) @@ -852,11 +852,12 @@ syncache_socket(struct syncache *sc, str tcp_mss(tp, sc->sc_peer_mss); /* -* If the SYN,ACK was retransmitted, reset cwnd to 1 segment. +* If the SYN,ACK was retransmitted, indicate that CWND to be +* limited to one segment in cc_conn_init(). * NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits. */ if (sc->sc_rxmits > 1) - tp->snd_cwnd = tp->t_maxseg; + tp->snd_cwnd = 1; #ifdef TCP_OFFLOAD /* Modified: head/sys/netinet/tcp_timer.c == --- head/sys/netinet/tcp_timer.cSun Oct 28 17:16:09 2012 (r242249) +++ head/sys/netinet/tcp_timer.cSun Oct 28 17:25:08 2012 (r242250) @@ -539,7 +539,13 @@ tcp_timer_rexmt(void * xtp) } INP_INFO_RUNLOCK(&V_tcbinfo); headlocked = 0; - if (tp->t_rxtshift == 1) { + if (tp->t_state == TCPS_SYN_SENT) { + /* +* If the SYN was retransmitted, indicate CWND to be +* limited to 1 segment in cc_conn_init(). +*/ + tp->snd_cwnd = 1; + } else if (tp->t_rxtshift == 1) { /* * first retransmit; record ssthresh and cwnd so they can * be recovered if this turns out to be a "bad" retransmit. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242251 - head/sys/netinet
Author: andre Date: Sun Oct 28 17:30:28 2012 New Revision: 242251 URL: http://svn.freebsd.org/changeset/base/242251 Log: When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after:2 weeks Modified: head/sys/netinet/tcp_output.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sun Oct 28 17:25:08 2012 (r242250) +++ head/sys/netinet/tcp_output.c Sun Oct 28 17:30:28 2012 (r242251) @@ -551,10 +551,14 @@ after_sack_rexmit: * max size segments, or at least 50% of the maximum possible * window, then want to send a window update to peer. * Skip this if the connection is in T/TCP half-open state. -* Don't send pure window updates when the peer has closed -* the connection and won't ever send more data. +* +* Don't send an independent window update if a delayed +* ACK is pending (it will get piggy-backed on it) or the +* remote side already has done a half-close and won't send +* more data. */ if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) && + !(tp->t_flags & TF_DELACK) && !TCPS_HAVERCVDFIN(tp->t_state)) { /* * "adv" is the amount we can increase the window, ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242252 - head/sys/netinet
Author: andre Date: Sun Oct 28 17:40:35 2012 New Revision: 242252 URL: http://svn.freebsd.org/changeset/base/242252 Log: Prevent a flurry of forced window updates when an application is doing small reads on a (partially) filled receive socket buffer. Normally one would a send a window update every time the available space in the socket buffer increases by two times MSS. This leads to a flurry of window updates that do not provide any meaningful new information to the sender. There still is available space in the window and the sender can continue sending data. All window updates then get carried by the regular ACKs. Only when the socket buffer was (almost) full and the window closed accordingly a window updates delivery new information and allows the sender to start sending more data again. Send window updates only every two MSS when the socket buffer has less than 1/8 space available, or the available space in the socket buffer increased by 1/4 its full capacity, or the socket buffer is very small. The next regular data ACK will carry and report the exact window size again. Reported by: sbruno Tested by:darrenr Tested by:Darren Baginski PR: kern/116335 MFC after:2 weeks Modified: head/sys/netinet/tcp_output.c Modified: head/sys/netinet/tcp_output.c == --- head/sys/netinet/tcp_output.c Sun Oct 28 17:30:28 2012 (r242251) +++ head/sys/netinet/tcp_output.c Sun Oct 28 17:40:35 2012 (r242252) @@ -545,23 +545,39 @@ after_sack_rexmit: } /* -* Compare available window to amount of window -* known to peer (as advertised window less -* next expected input). If the difference is at least two -* max size segments, or at least 50% of the maximum possible -* window, then want to send a window update to peer. -* Skip this if the connection is in T/TCP half-open state. +* Sending of standalone window updates. +* +* Window updates important when we close our window due to a full +* socket buffer and are opening it again after the application +* reads data from it. Once the window has opened again and the +* remote end starts to send again the ACK clock takes over and +* provides the most current window information. +* +* We must avoid to the silly window syndrome whereas every read +* from the receive buffer, no matter how small, causes a window +* update to be sent. We also should avoid sending a flurry of +* window updates when the socket buffer had queued a lot of data +* and the application is doing small reads. +* +* Prevent a flurry of pointless window updates by only sending +* an update when we can increase the advertized window by more +* than 1/4th of the socket buffer capacity. When the buffer is +* getting full or is very small be more aggressive and send an +* update whenever we can increase by two mss sized segments. +* In all other situations the ACK's to new incoming data will +* carry further window increases. * * Don't send an independent window update if a delayed * ACK is pending (it will get piggy-backed on it) or the * remote side already has done a half-close and won't send -* more data. +* more data. Skip this if the connection is in T/TCP +* half-open state. */ if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) && !(tp->t_flags & TF_DELACK) && !TCPS_HAVERCVDFIN(tp->t_state)) { /* -* "adv" is the amount we can increase the window, +* "adv" is the amount we could increase the window, * taking into account that we are limited by * TCP_MAXWIN << tp->rcv_scale. */ @@ -581,9 +597,11 @@ after_sack_rexmit: */ if (oldwin >> tp->rcv_scale == (adv + oldwin) >> tp->rcv_scale) goto dontupdate; - if (adv >= (long) (2 * tp->t_maxseg)) - goto send; - if (2 * adv >= (long) so->so_rcv.sb_hiwat) + + if (adv >= (long)(2 * tp->t_maxseg) && + (adv >= (long)(so->so_rcv.sb_hiwat / 4) || +recwin <= (long)(so->so_rcv.sb_hiwat / 8) || +so->so_rcv.sb_hiwat <= 8 * tp->t_maxseg)) goto send; } dontupdate: ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242253 - head/sys/netinet
Author: andre Date: Sun Oct 28 17:59:46 2012 New Revision: 242253 URL: http://svn.freebsd.org/changeset/base/242253 Log: Simplify implementation of net.inet.tcp.reass.maxsegments and net.inet.tcp.reass.cursegments. MFC after:2 weeks Modified: head/sys/netinet/tcp_reass.c Modified: head/sys/netinet/tcp_reass.c == --- head/sys/netinet/tcp_reass.cSun Oct 28 17:40:35 2012 (r242252) +++ head/sys/netinet/tcp_reass.cSun Oct 28 17:59:46 2012 (r242253) @@ -74,7 +74,6 @@ __FBSDID("$FreeBSD$"); #include #endif /* TCPDEBUG */ -static int tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS); static int tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS); static SYSCTL_NODE(_net_inet_tcp, OID_AUTO, reass, CTLFLAG_RW, 0, @@ -82,16 +81,12 @@ static SYSCTL_NODE(_net_inet_tcp, OID_AU static VNET_DEFINE(int, tcp_reass_maxseg) = 0; #defineV_tcp_reass_maxseg VNET(tcp_reass_maxseg) -SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, maxsegments, -CTLTYPE_INT | CTLFLAG_RDTUN, -&VNET_NAME(tcp_reass_maxseg), 0, &tcp_reass_sysctl_maxseg, "I", +SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxsegments, CTLFLAG_RDTUN, +&VNET_NAME(tcp_reass_maxseg), 0, "Global maximum number of TCP Segments in Reassembly Queue"); -static VNET_DEFINE(int, tcp_reass_qsize) = 0; -#defineV_tcp_reass_qsize VNET(tcp_reass_qsize) SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, cursegments, -CTLTYPE_INT | CTLFLAG_RD, -&VNET_NAME(tcp_reass_qsize), 0, &tcp_reass_sysctl_qsize, "I", +(CTLTYPE_INT | CTLFLAG_RD), NULL, 0, &tcp_reass_sysctl_qsize, "I", "Global number of TCP Segments currently in Reassembly Queue"); static VNET_DEFINE(int, tcp_reass_overflows) = 0; @@ -109,8 +104,10 @@ static void tcp_reass_zone_change(void *tag) { + /* Set the zone limit and read back the effective value. */ V_tcp_reass_maxseg = nmbclusters / 16; uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg); + V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone); } void @@ -122,7 +119,9 @@ tcp_reass_init(void) &V_tcp_reass_maxseg); V_tcp_reass_zone = uma_zcreate("tcpreass", sizeof (struct tseg_qent), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); + /* Set the zone limit and read back the effective value. */ uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg); + V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone); EVENTHANDLER_REGISTER(nmbclusters_change, tcp_reass_zone_change, NULL, EVENTHANDLER_PRI_ANY); } @@ -156,17 +155,12 @@ tcp_reass_flush(struct tcpcb *tp) } static int -tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS) -{ - V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone); - return (sysctl_handle_int(oidp, arg1, arg2, req)); -} - -static int tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS) { - V_tcp_reass_qsize = uma_zone_get_cur(V_tcp_reass_zone); - return (sysctl_handle_int(oidp, arg1, arg2, req)); + int qsize; + + qsize = uma_zone_get_cur(V_tcp_reass_zone); + return (sysctl_handle_int(oidp, &qsize, sizeof(qsize), req)); } int ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242254 - head/sys/netinet
Author: andre Date: Sun Oct 28 18:07:34 2012 New Revision: 242254 URL: http://svn.freebsd.org/changeset/base/242254 Log: Change the syncache count reporting the current number of entries from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA. Also read back the actual cache_limit after page size rounding by UMA. PR: kern/165879 MFC after:2 weeks Modified: head/sys/netinet/tcp_syncache.c head/sys/netinet/tcp_syncache.h Modified: head/sys/netinet/tcp_syncache.c == --- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:59:46 2012 (r242253) +++ head/sys/netinet/tcp_syncache.c Sun Oct 28 18:07:34 2012 (r242254) @@ -123,6 +123,7 @@ struct syncache *syncache_lookup(struct static int syncache_respond(struct syncache *); static struct socket *syncache_socket(struct syncache *, struct socket *, struct mbuf *m); +static int syncache_sysctl_count(SYSCTL_HANDLER_ARGS); static void syncache_timeout(struct syncache *sc, struct syncache_head *sch, int docallout); static void syncache_timer(void *); @@ -158,8 +159,8 @@ SYSCTL_VNET_UINT(_net_inet_tcp_syncache, &VNET_NAME(tcp_syncache.cache_limit), 0, "Overall entry limit for syncache"); -SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, count, CTLFLAG_RD, -&VNET_NAME(tcp_syncache.cache_count), 0, +SYSCTL_VNET_PROC(_net_inet_tcp_syncache, OID_AUTO, count, (CTLTYPE_UINT|CTLFLAG_RD), +NULL, 0, &syncache_sysctl_count, "IU", "Current number of entries in syncache"); SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, hashsize, CTLFLAG_RDTUN, @@ -225,7 +226,6 @@ syncache_init(void) { int i; - V_tcp_syncache.cache_count = 0; V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE; V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT; V_tcp_syncache.rexmt_limit = SYNCACHE_MAXREXMTS; @@ -269,6 +269,7 @@ syncache_init(void) V_tcp_syncache.zone = uma_zcreate("syncache", sizeof(struct syncache), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); uma_zone_set_max(V_tcp_syncache.zone, V_tcp_syncache.cache_limit); + V_tcp_syncache.cache_limit = uma_zone_get_max(V_tcp_syncache.zone); } #ifdef VIMAGE @@ -296,8 +297,8 @@ syncache_destroy(void) mtx_destroy(&sch->sch_mtx); } - KASSERT(V_tcp_syncache.cache_count == 0, ("%s: cache_count %d not 0", - __func__, V_tcp_syncache.cache_count)); + KASSERT(uma_zone_get_cur(V_tcp_syncache.zone) == 0, + ("%s: cache_count not 0", __func__)); /* Free the allocated global resources. */ uma_zdestroy(V_tcp_syncache.zone); @@ -305,6 +306,15 @@ syncache_destroy(void) } #endif +static int +syncache_sysctl_count(SYSCTL_HANDLER_ARGS) +{ + int count; + + count = uma_zone_get_cur(V_tcp_syncache.zone); + return (sysctl_handle_int(oidp, &count, sizeof(count), req)); +} + /* * Inserts a syncache entry into the specified bucket row. * Locks and unlocks the syncache_head autonomously. @@ -347,7 +357,6 @@ syncache_insert(struct syncache *sc, str SCH_UNLOCK(sch); - V_tcp_syncache.cache_count++; TCPSTAT_INC(tcps_sc_added); } @@ -373,7 +382,6 @@ syncache_drop(struct syncache *sc, struc #endif syncache_free(sc); - V_tcp_syncache.cache_count--; } /* @@ -958,7 +966,6 @@ syncache_expand(struct in_conninfo *inc, tod->tod_syncache_removed(tod, sc->sc_todctx); } #endif - V_tcp_syncache.cache_count--; SCH_UNLOCK(sch); } Modified: head/sys/netinet/tcp_syncache.h == --- head/sys/netinet/tcp_syncache.h Sun Oct 28 17:59:46 2012 (r242253) +++ head/sys/netinet/tcp_syncache.h Sun Oct 28 18:07:34 2012 (r242254) @@ -112,7 +112,6 @@ struct tcp_syncache { u_int hashsize; u_int hashmask; u_int bucket_limit; - u_int cache_count;/* XXX: unprotected */ u_int cache_limit; u_int rexmt_limit; u_int hash_secret; ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242255 - head/sys/netinet
Author: andre Date: Sun Oct 28 18:33:52 2012 New Revision: 242255 URL: http://svn.freebsd.org/changeset/base/242255 Log: Allow arbitrary MSS sizes and don't mind about the cluster size anymore. We've got more cluster sizes for quite some time now and the orginally imposed limits and the previously codified thoughts on efficiency gains are no longer true. MFC after:2 weeks Modified: head/sys/netinet/tcp_input.c Modified: head/sys/netinet/tcp_input.c == --- head/sys/netinet/tcp_input.cSun Oct 28 18:07:34 2012 (r242254) +++ head/sys/netinet/tcp_input.cSun Oct 28 18:33:52 2012 (r242255) @@ -3322,10 +3322,8 @@ tcp_xmit_timer(struct tcpcb *tp, int rtt /* * Determine a reasonable value for maxseg size. * If the route is known, check route for mtu. - * If none, use an mss that can be handled on the outgoing - * interface without forcing IP to fragment; if bigger than - * an mbuf cluster (MCLBYTES), round down to nearest multiple of MCLBYTES - * to utilize large mbufs. If no route is found, route has no mtu, + * If none, use an mss that can be handled on the outgoing interface + * without forcing IP to fragment. If no route is found, route has no mtu, * or the destination isn't local, use a default, hopefully conservative * size (usually 512 or the default IP max size, but no more than the mtu * of the interface), as we can't discover anything about intervening @@ -3506,13 +3504,6 @@ tcp_mss_update(struct tcpcb *tp, int off (tp->t_flags & TF_RCVD_TSTMP) == TF_RCVD_TSTMP)) mss -= TCPOLEN_TSTAMP_APPA; -#if(MCLBYTES & (MCLBYTES - 1)) == 0 - if (mss > MCLBYTES) - mss &= ~(MCLBYTES-1); -#else - if (mss > MCLBYTES) - mss = mss / MCLBYTES * MCLBYTES; -#endif tp->t_maxseg = mss; } ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
svn commit: r242256 - head/sys/kern
Author: andre Date: Sun Oct 28 18:38:51 2012 New Revision: 242256 URL: http://svn.freebsd.org/changeset/base/242256 Log: Improve m_cat() by being able to also merge contents from M_EXT mbuf's by doing proper testing with M_WRITABLE(). In m_collapse() replace an incomplete manual check for M_RDONLY with the M_WRITABLE() macro that also tests for shared buffers and other cases that make a particular mbuf immutable. MFC after:2 weeks Modified: head/sys/kern/uipc_mbuf.c Modified: head/sys/kern/uipc_mbuf.c == --- head/sys/kern/uipc_mbuf.c Sun Oct 28 18:33:52 2012(r242255) +++ head/sys/kern/uipc_mbuf.c Sun Oct 28 18:38:51 2012(r242256) @@ -911,8 +911,8 @@ m_cat(struct mbuf *m, struct mbuf *n) while (m->m_next) m = m->m_next; while (n) { - if (m->m_flags & M_EXT || - m->m_data + m->m_len + n->m_len >= &m->m_dat[MLEN]) { + if (!M_WRITABLE(m) || + M_TRAILINGSPACE(m) < n->m_len) { /* just join the two chains */ m->m_next = n; return; @@ -1584,7 +1584,7 @@ again: n = m->m_next; if (n == NULL) break; - if ((m->m_flags & M_RDONLY) == 0 && + if (M_WRITABLE(m) && n->m_len < M_TRAILINGSPACE(m)) { bcopy(mtod(n, void *), mtod(m, char *) + m->m_len, n->m_len); ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"