Re: svn commit: r230264 - head/sys/sys

2012-01-17 Thread Andre Oppermann

On 17.01.2012 13:16, Gleb Smirnoff wrote:

On Tue, Jan 17, 2012 at 12:13:37PM +, Gleb Smirnoff wrote:
T>  Author: glebius
T>  Date: Tue Jan 17 12:13:36 2012
T>  New Revision: 230264
T>  URL: http://svn.freebsd.org/changeset/base/230264
T>
T>  Log:
T>Provide a function m_get2() that allocates a minimal mbuf that
T>would fit specified size. Returned mbuf may be a single mbuf,
T>an mbuf with a cluster from packet zone, or an mbuf with jumbo
T>cluster of sufficient size.

I am open to discussion on bikeshed color^W^W a better name for
this function.


We already have m_getm2() which does the same for mbuf chains.


I utilized it in pfsync, however there are several other places where
it can be used instead of handrolled "if else if else" constructs.


Handrolled mbuf allocation isn't good.  Should be all in one place.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r226113 - head/sys/netinet

2012-01-23 Thread Andre Oppermann

On 23.01.2012 15:24, Lawrence Stewart wrote:

Hi Andre,

On 10/08/11 03:39, Andre Oppermann wrote:

Author: andre
Date: Fri Oct 7 16:39:03 2011
New Revision: 226113
URL: http://svn.freebsd.org/changeset/base/226113

Log:
Prevent TCP sessions from stalling indefinitely in reassembly
when reaching the zone limit of reassembly queue entries.


[snip]

Any reason this was not MFCed to stable/8 and stable/7 when you MFCed to 
stable/9? As far as I can
tell, both r226113 and r228016 need to be MFCed to 8 and 7.


Thanks for the reminder.  Test build for MFC is under way, including your later
fixup.  I'll send it to you for review to make sure everything's correct.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r223839 - in head/sys: conf kern netinet

2011-07-07 Thread Andre Oppermann
Author: andre
Date: Thu Jul  7 10:37:14 2011
New Revision: 223839
URL: http://svn.freebsd.org/changeset/base/223839

Log:
  Remove the TCP_SORECEIVE_STREAM compile time option.  The use of
  soreceive_stream() for TCP still has to be enabled with the loader
  tuneable net.inet.tcp.soreceive_stream.
  
  Suggested by: trociny and others

Modified:
  head/sys/conf/options
  head/sys/kern/uipc_socket.c
  head/sys/netinet/tcp_subr.c

Modified: head/sys/conf/options
==
--- head/sys/conf/options   Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/conf/options   Thu Jul  7 10:37:14 2011(r223839)
@@ -427,7 +427,6 @@ SLIP_IFF_OPTS   opt_slip.h
 TCPDEBUG
 TCP_OFFLOAD_DISABLEopt_inet.h #Disable code to dispatch tcp offloading
 TCP_SIGNATURE  opt_inet.h
-TCP_SORECEIVE_STREAM   opt_inet.h
 VLAN_ARRAY opt_vlan.h
 XBONEHACK
 FLOWTABLE  opt_route.h

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/kern/uipc_socket.c Thu Jul  7 10:37:14 2011(r223839)
@@ -1915,7 +1915,6 @@ release:
 /*
  * Optimized version of soreceive() for stream (TCP) sockets.
  */
-#ifdef TCP_SORECEIVE_STREAM
 int
 soreceive_stream(struct socket *so, struct sockaddr **psa, struct uio *uio,
 struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
@@ -2109,7 +2108,6 @@ out:
sbunlock(sb);
return (error);
 }
-#endif /* TCP_SORECEIVE_STREAM */
 
 /*
  * Optimized version of soreceive() for simple datagram cases from userspace.

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/netinet/tcp_subr.c Thu Jul  7 10:37:14 2011(r223839)
@@ -206,11 +206,9 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 &VNET_NAME(tcp_isn_reseed_interval), 0,
 "Seconds between reseeding of ISN secret");
 
-#ifdef TCP_SORECEIVE_STREAM
 static int tcp_soreceive_stream = 0;
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, soreceive_stream, CTLFLAG_RDTUN,
 &tcp_soreceive_stream, 0, "Using soreceive_stream for TCP sockets");
-#endif
 
 #ifdef TCP_SIGNATURE
 static int tcp_sig_checksigs = 1;
@@ -337,13 +335,13 @@ tcp_init(void)
tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT;
tcp_tcbhashsize = hashsize;
 
-#ifdef TCP_SORECEIVE_STREAM
TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", 
&tcp_soreceive_stream);
if (tcp_soreceive_stream) {
tcp_usrreqs.pru_soreceive = soreceive_stream;
+#ifdef INET6
tcp6_usrreqs.pru_soreceive = soreceive_stream;
+#endif /* INET6 */
}
-#endif
 
 #ifdef INET6
 #define TCP_MINPROTOHDR (sizeof(struct ip6_hdr) + sizeof(struct tcphdr))
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r223862 - in head/sys: net netinet netinet6

2011-07-08 Thread Andre Oppermann

On 08.07.2011 11:38, Marko Zec wrote:

Author: zec
Date: Fri Jul  8 09:38:33 2011
New Revision: 223862
URL: http://svn.freebsd.org/changeset/base/223862

Log:
   Permit ARP to proceed for IPv4 host routes for which the gateway is the
   same as the host address.  This already works fine for INET6 and ND6.


Can you give an example what this does? Is it some sort of proxy ARP?


   While here, remove two function pointers from struct lltable which are
   only initialized but never used.


Ideally this would have been a separate commit because it has nothing to
do with primary functional change.

--
Andre


   MFC after:   3 days

Modified:
   head/sys/net/if_llatbl.h
   head/sys/netinet/in.c
   head/sys/netinet6/in6.c

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r223863 - head/sys/kern

2011-07-08 Thread Andre Oppermann
Author: andre
Date: Fri Jul  8 10:50:13 2011
New Revision: 223863
URL: http://svn.freebsd.org/changeset/base/223863

Log:
  In the experimental soreceive_stream():
  
   o Move the non-blocking socket test below the SBS_CANTRCVMORE so that EOF
 is correctly returned on a remote connection close.
   o In the non-blocking socket test compare SS_NBIO against the so->so_state
 field instead of the incorrect sb->sb_state field.
   o Simplify the ENOTCONN test by removing cases that can't occur.
  
  Submitted by: trociny (with some further tweaks by committer)
  Tested by:trociny

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Fri Jul  8 09:38:33 2011(r223862)
+++ head/sys/kern/uipc_socket.c Fri Jul  8 10:50:13 2011(r223863)
@@ -1954,20 +1954,9 @@ soreceive_stream(struct socket *so, stru
}
oresid = uio->uio_resid;
 
-   /* We will never ever get anything unless we are connected. */
+   /* We will never ever get anything unless we are or were connected. */
if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
-   /* When disconnecting there may be still some data left. */
-   if (sb->sb_cc > 0)
-   goto deliver;
-   if (!(so->so_state & SS_ISDISCONNECTED))
-   error = ENOTCONN;
-   goto out;
-   }
-
-   /* Socket buffer is empty and we shall not block. */
-   if (sb->sb_cc == 0 &&
-   ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO {
-   error = EAGAIN;
+   error = ENOTCONN;
goto out;
}
 
@@ -1994,6 +1983,13 @@ restart:
goto out;
}
 
+   /* Socket buffer is empty and we shall not block. */
+   if (sb->sb_cc == 0 &&
+   ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO {
+   error = EAGAIN;
+   goto out;
+   }
+
/* Socket buffer got some data that we shall deliver now. */
if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) &&
((sb->sb_flags & SS_NBIO) ||
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r210666 - head/sys/netinet

2010-07-30 Thread Andre Oppermann
Author: andre
Date: Fri Jul 30 21:45:53 2010
New Revision: 210666
URL: http://svn.freebsd.org/changeset/base/210666

Log:
  Fix a bug in syncache where the initial CWND for new incoming connections
  was limited to one segment under the faulty assumption of a retransmit.
  Due to this the opportunity to initialize the increased congestion window
  according to RFC3390 was missed.
  
  Support for RFC3465 introduced in r187289 uncovered the bug as the ACK
  to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this
  increase shouldn't happen as it's explicitly forbidden by RFC3390, but
  it's another issue).  Snd_cwnd remains really small (1*MSS + 1) and this
  causes really bad interaction with delayed acks on other side.
  
  The variable name sc_rxmits is a bit misleading as it counts all transmits,
  not just retransmits.
  
  Submitted by: Maxim Dounin 
  MFC after:10 days

Modified:
  head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Fri Jul 30 21:39:28 2010
(r210665)
+++ head/sys/netinet/tcp_syncache.c Fri Jul 30 21:45:53 2010
(r210666)
@@ -804,8 +804,9 @@ syncache_socket(struct syncache *sc, str
 
/*
 * If the SYN,ACK was retransmitted, reset cwnd to 1 segment.
+* NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits.
 */
-   if (sc->sc_rxmits)
+   if (sc->sc_rxmits > 1)
tp->snd_cwnd = tp->t_maxseg;
tcp_timer_activate(tp, TT_KEEP, tcp_keepinit);
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211315 - head/sys/netinet

2010-08-14 Thread Andre Oppermann
Author: andre
Date: Sat Aug 14 20:40:55 2010
New Revision: 211315
URL: http://svn.freebsd.org/changeset/base/211315

Log:
  Disable TCP inflight limiter by default.
  
  It was experimental and interferes with the normal congestion control
  algorithms by instating a separate, possibly lower, ceiling for the
  amount of data that is in flight to the remote host.  With high speed
  internet connections the inflight limit frequently has been estimated
  too low due to the noisy nature of the RTT measurements.
  
  This code gives way for the upcoming pluggable congestion control
  framework.  It is the task of the congestion control algorithm to
  set the congestion window and amount of inflight data without external
  interference.
  
  Reviewed by:  lstewart
  MFC after:1 week
  Removal after:1 month

Modified:
  head/sys/netinet/tcp_subr.c

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Sat Aug 14 20:12:10 2010(r211314)
+++ head/sys/netinet/tcp_subr.c Sat Aug 14 20:40:55 2010(r211315)
@@ -221,7 +221,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 SYSCTL_NODE(_net_inet_tcp, OID_AUTO, inflight, CTLFLAG_RW, 0,
 "TCP inflight data limiting");
 
-static VNET_DEFINE(int, tcp_inflight_enable) = 1;
+static VNET_DEFINE(int, tcp_inflight_enable) = 0;
 #defineV_tcp_inflight_enable   VNET(tcp_inflight_enable)
 SYSCTL_VNET_INT(_net_inet_tcp_inflight, OID_AUTO, enable, CTLFLAG_RW,
 &VNET_NAME(tcp_inflight_enable), 0,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211316 - head/sys/netinet

2010-08-14 Thread Andre Oppermann
Author: andre
Date: Sat Aug 14 21:04:27 2010
New Revision: 211316
URL: http://svn.freebsd.org/changeset/base/211316

Log:
  Change the messages of the ICMP bad port bandwidth limiter from
  a kernel printf to a log output with the priority of LOG_NOTICE.
  
  This way the messages still show up in /var/log/messages but no
  longer spam the console every other second on busy servers that
  are port scanned:
   "Limiting open port RST response from 114 to 100 packets/sec"
  
  PR:   kern/147352
  Submitted by: Eugene Grosbein 
  MFC after:1 week

Modified:
  head/sys/netinet/ip_icmp.c

Modified: head/sys/netinet/ip_icmp.c
==
--- head/sys/netinet/ip_icmp.c  Sat Aug 14 20:40:55 2010(r211315)
+++ head/sys/netinet/ip_icmp.c  Sat Aug 14 21:04:27 2010(r211316)
@@ -42,6 +42,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -975,7 +976,7 @@ badport_bandlim(int which)
 * the previous behaviour at the expense of added complexity.
 */
if (V_icmplim_output && opps > V_icmplim)
-   printf("Limiting %s from %d to %d packets/sec\n",
+   log(LOG_NOTICE, "Limiting %s from %d to %d 
packets/sec\n",
r->type, opps, V_icmplim);
}
return 0;   /* okay to send packet */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211317 - head/sys/netinet

2010-08-14 Thread Andre Oppermann
Author: andre
Date: Sat Aug 14 21:41:33 2010
New Revision: 211317
URL: http://svn.freebsd.org/changeset/base/211317

Log:
  When using TSO and sending more than TCP_MAXWIN sendalot is set
  and we loop back to 'again'.  If the remainder is less or equal
  to one full segment, the TSO flag was not cleared even though
  it isn't necessary anymore.  Enabling the TSO flag on a segment
  that doesn't require any offloaded segmentation by the NIC may
  cause confusion in the driver or hardware.
  
  Reset the internal tso flag in tcp_output() on every iteration
  of sendalot.
  
  PR:   kern/132832
  Submitted by: Renaud Lienhart 
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sat Aug 14 21:04:27 2010
(r211316)
+++ head/sys/netinet/tcp_output.c   Sat Aug 14 21:41:33 2010
(r211317)
@@ -153,7 +153,7 @@ tcp_output(struct tcpcb *tp)
int idle, sendalot;
int sack_rxmit, sack_bytes_rxmt;
struct sackhole *p;
-   int tso = 0;
+   int tso;
struct tcpopt to;
 #if 0
int maxburst = TCP_MAXBURST;
@@ -211,6 +211,7 @@ again:
SEQ_LT(tp->snd_nxt, tp->snd_max))
tcp_sack_adjust(tp);
sendalot = 0;
+   tso = 0;
off = tp->snd_nxt - tp->snd_una;
sendwin = min(tp->snd_wnd, tp->snd_cwnd);
sendwin = min(sendwin, tp->snd_bwnd);
@@ -490,9 +491,9 @@ after_sack_rexmit:
} else {
len = tp->t_maxseg;
sendalot = 1;
-   tso = 0;
}
}
+
if (sack_rxmit) {
if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
flags &= ~TH_FIN;
@@ -1051,6 +1052,8 @@ send:
 * XXX: Fixme: This is currently not the case for IPv6.
 */
if (tso) {
+   KASSERT(len > tp->t_maxopd - optlen,
+   ("%s: len <= tso_segsz", __func__));
m->m_pkthdr.csum_flags |= CSUM_TSO;
m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
}
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211327 - head/sys/netinet

2010-08-15 Thread Andre Oppermann
Author: andre
Date: Sun Aug 15 09:30:13 2010
New Revision: 211327
URL: http://svn.freebsd.org/changeset/base/211327

Log:
  Add more logging points for failures in syncache_socket() to
  report when a new socket couldn't be created because one of
  in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.
  
  Logging is conditional on net.inet.tcp.log_debug being enabled.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010
(r211326)
+++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010
(r211327)
@@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str
struct inpcb *inp = NULL;
struct socket *so;
struct tcpcb *tp;
+   int error = 0;
char *s;
 
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
@@ -675,7 +676,7 @@ syncache_socket(struct syncache *sc, str
}
 #endif
inp->inp_lport = sc->sc_inc.inc_lport;
-   if (in_pcbinshash(inp) != 0) {
+   if ((error = in_pcbinshash(inp)) != 0) {
/*
 * Undo the assignments above if we failed to
 * put the PCB on the hash lists.
@@ -687,6 +688,12 @@ syncache_socket(struct syncache *sc, str
 #endif
inp->inp_laddr.s_addr = INADDR_ANY;
inp->inp_lport = 0;
+   if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
+   log(LOG_DEBUG, "%s; %s: in_pcbinshash failed "
+   "with error %i\n",
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
 #ifdef IPSEC
@@ -721,9 +728,15 @@ syncache_socket(struct syncache *sc, str
laddr6 = inp->in6p_laddr;
if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
inp->in6p_laddr = sc->sc_inc.inc6_laddr;
-   if (in6_pcbconnect(inp, (struct sockaddr *)&sin6,
-   thread0.td_ucred)) {
+   if ((error = in6_pcbconnect(inp, (struct sockaddr *)&sin6,
+   thread0.td_ucred)) != 0) {
inp->in6p_laddr = laddr6;
+   if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) 
{
+   log(LOG_DEBUG, "%s; %s: in6_pcbconnect failed "
+   "with error %i\n",
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
/* Override flowlabel from in6_pcbconnect. */
@@ -750,9 +763,15 @@ syncache_socket(struct syncache *sc, str
laddr = inp->inp_laddr;
if (inp->inp_laddr.s_addr == INADDR_ANY)
inp->inp_laddr = sc->sc_inc.inc_laddr;
-   if (in_pcbconnect(inp, (struct sockaddr *)&sin,
-   thread0.td_ucred)) {
+   if ((error = in_pcbconnect(inp, (struct sockaddr *)&sin,
+   thread0.td_ucred)) != 0) {
inp->inp_laddr = laddr;
+   if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) 
{
+   log(LOG_DEBUG, "%s; %s: in_pcbconnect failed "
+   "with error %i\n",
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
}
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211332 - head/sys/netinet

2010-08-15 Thread Andre Oppermann
Author: andre
Date: Sun Aug 15 13:07:08 2010
New Revision: 211332
URL: http://svn.freebsd.org/changeset/base/211332

Log:
  Initializing the new error variable to zero in syncache_socket()
  is not necessary.
  
  Noticed by:   bz

Modified:
  head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Aug 15 11:44:08 2010
(r211331)
+++ head/sys/netinet/tcp_syncache.c Sun Aug 15 13:07:08 2010
(r211332)
@@ -627,7 +627,7 @@ syncache_socket(struct syncache *sc, str
struct inpcb *inp = NULL;
struct socket *so;
struct tcpcb *tp;
-   int error = 0;
+   int error;
char *s;
 
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211333 - head/sys/netinet

2010-08-15 Thread Andre Oppermann
Author: andre
Date: Sun Aug 15 13:25:18 2010
New Revision: 211333
URL: http://svn.freebsd.org/changeset/base/211333

Log:
  Fix the interaction between 'ICMP fragmentation needed' MTU updates,
  path MTU discovery and the tcp_minmss limiter for very small MTU's.
  
  When the MTU suggested by the gateway via ICMP, or if there isn't
  any the next smaller step from ip_next_mtu(), is lower than the
  floor enforced by net.inet.tcp.minmss (default 216) the value is
  ignored and the default MSS (512) is used instead.  However the
  DF flag in the IP header is still set in tcp_output() preventing
  fragmentation by the gateway.
  
  Fix this by using tcp_minmss as the MSS and clear the DF flag if
  the suggested MTU is too low.  This turns off path MTU dissovery
  for the remainder of the session and allows fragmentation to be
  done by the gateway.
  
  Only MTU's smaller than 256 are affected.  The smallest official
  MTU specified is for AX.25 packet radio at 256 octets.
  
  PR:   kern/146628
  Tested by:Matthew Luckie 
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Aug 15 13:07:08 2010
(r211332)
+++ head/sys/netinet/tcp_output.c   Sun Aug 15 13:25:18 2010
(r211333)
@@ -1186,8 +1186,10 @@ timer:
 * This might not be the best thing to do according to RFC3390
 * Section 2. However the tcp hostcache migitates the problem
 * so it affects only the first tcp connection with a host.
+*
+* NB: Don't set DF on small MTU/MSS to have a safe fallback.
 */
-   if (V_path_mtu_discovery)
+   if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
ip->ip_off |= IP_DF;
 
error = ip_output(m, tp->t_inpcb->inp_options, NULL,

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Sun Aug 15 13:07:08 2010(r211332)
+++ head/sys/netinet/tcp_subr.c Sun Aug 15 13:25:18 2010(r211333)
@@ -1339,11 +1339,9 @@ tcp_ctlinput(int cmd, struct sockaddr *s
if (!mtu)
mtu = ip_next_mtu(ip->ip_len,
 1);
-   if (mtu < max(296, V_tcp_minmss
-+ sizeof(struct tcpiphdr)))
-   mtu = 0;
-   if (!mtu)
-   mtu = V_tcp_mssdflt
+   if (mtu < V_tcp_minmss
++ sizeof(struct tcpiphdr))
+   mtu = V_tcp_minmss
 + sizeof(struct tcpiphdr);
/*
 * Only cache the the MTU if it
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r211327 - head/sys/netinet

2010-08-15 Thread Andre Oppermann

On 15.08.2010 11:41, Bjoern A. Zeeb wrote:

On Sun, 15 Aug 2010, Andre Oppermann wrote:


Author: andre
Date: Sun Aug 15 09:30:13 2010
New Revision: 211327
URL: http://svn.freebsd.org/changeset/base/211327

Log:
Add more logging points for failures in syncache_socket() to
report when a new socket couldn't be created because one of
in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.

Logging is conditional on net.inet.tcp.log_debug being enabled.

MFC after: 1 week

Modified:
head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==

--- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010 (r211326)
+++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010 (r211327)
@@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str
struct inpcb *inp = NULL;
struct socket *so;
struct tcpcb *tp;
+ int error = 0;



Is there any need to initialize here?


No.  Actually not.  Was just my style of using safe initial values.
But here the return value is the socket pointer of NULL.  The error
is not passed back directly.

Fixed in r211332.

Thanks for noticing and reporting.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211396 - head/sys/vm

2010-08-16 Thread Andre Oppermann
Author: andre
Date: Mon Aug 16 14:24:00 2010
New Revision: 211396
URL: http://svn.freebsd.org/changeset/base/211396

Log:
  Add uma_zone_get_max() to obtain the effective limit after a call
  to uma_zone_set_max().
  
  The UMA zone limit is not exactly set to the value supplied but
  rounded up to completely fill the backing store increment (a page
  normally).  This can lead to surprising situations where the number
  of elements allocated from UMA is higher than the supplied limit
  value.  The new get function reads back the effective value so that
  the supplied limit value can be adjusted to the real limit.
  
  Reviewed by:  jeffr
  MFC after:1 week

Modified:
  head/sys/vm/uma.h
  head/sys/vm/uma_core.c

Modified: head/sys/vm/uma.h
==
--- head/sys/vm/uma.h   Mon Aug 16 12:37:17 2010(r211395)
+++ head/sys/vm/uma.h   Mon Aug 16 14:24:00 2010(r211396)
@@ -459,6 +459,18 @@ int uma_zone_set_obj(uma_zone_t zone, st
 void uma_zone_set_max(uma_zone_t zone, int nitems);
 
 /*
+ * Obtains the effective limit on the number of items in a zone
+ *
+ * Arguments:
+ * zone  The zone to obtain the effective limit from
+ *
+ * Return:
+ * 0  No limit
+ * int  The effective limit of the zone
+ */
+int uma_zone_get_max(uma_zone_t zone);
+
+/*
  * The following two routines (uma_zone_set_init/fini)
  * are used to set the backend init/fini pair which acts on an
  * object as it becomes allocated and is placed in a slab within

Modified: head/sys/vm/uma_core.c
==
--- head/sys/vm/uma_core.c  Mon Aug 16 12:37:17 2010(r211395)
+++ head/sys/vm/uma_core.c  Mon Aug 16 14:24:00 2010(r211396)
@@ -2797,6 +2797,24 @@ uma_zone_set_max(uma_zone_t zone, int ni
 }
 
 /* See uma.h */
+int
+uma_zone_get_max(uma_zone_t zone)
+{
+   int nitems;
+   uma_keg_t keg;
+
+   ZONE_LOCK(zone);
+   keg = zone_first_keg(zone);
+   if (keg->uk_maxpages)
+   nitems = keg->uk_maxpages * keg->uk_ipers;
+   else
+   nitems = 0;
+   ZONE_UNLOCK(zone);
+
+   return (nitems);
+}
+
+/* See uma.h */
 void
 uma_zone_set_init(uma_zone_t zone, uma_init uminit)
 {
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211462 - head/sys/netinet

2010-08-18 Thread Andre Oppermann
Author: andre
Date: Wed Aug 18 17:39:47 2010
New Revision: 211462
URL: http://svn.freebsd.org/changeset/base/211462

Log:
  Untangle the net.inet.tcp.log_in_vain and net.inet.tcp.log_debug
  sysctl's and remove any side effects.
  
  Both sysctl's share the same backend infrastructure and due to the
  way it was implemented enabling net.inet.tcp.log_in_vain would also
  cause log_debug output to be generated.  This was surprising and
  eventually annoying to the user.
  
  The log output backend is kept the same but a little shim is inserted
  to properly separate log_in_vain and log_debug and to remove any side
  effects.
  
  PR:   kern/137317
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_subr.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cWed Aug 18 15:58:26 2010
(r211461)
+++ head/sys/netinet/tcp_input.cWed Aug 18 17:39:47 2010
(r211462)
@@ -571,7 +571,7 @@ findpcb:
 */
if ((tcp_log_in_vain == 1 && (thflags & TH_SYN)) ||
tcp_log_in_vain == 2) {
-   if ((s = tcp_log_addrs(NULL, th, (void *)ip, ip6)))
+   if ((s = tcp_log_vain(NULL, th, (void *)ip, ip6)))
log(LOG_INFO, "%s; %s: Connection attempt "
"to closed port\n", s, __func__);
}

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Wed Aug 18 15:58:26 2010(r211461)
+++ head/sys/netinet/tcp_subr.c Wed Aug 18 17:39:47 2010(r211462)
@@ -268,6 +268,8 @@ VNET_DEFINE(uma_zone_t, sack_hole_zone);
 
 static struct inpcb *tcp_notify(struct inpcb *, int);
 static voidtcp_isn_tick(void *);
+static char *  tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th,
+   void *ip4hdr, const void *ip6hdr);
 
 /*
  * Target size of TCP PCB hash tables. Must be a power of two.
@@ -2234,9 +2236,33 @@ SYSCTL_PROC(_net_inet_tcp, TCPCTL_DROP, 
  * and ip6_hdr pointers have to be passed as void pointers.
  */
 char *
+tcp_log_vain(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
+const void *ip6hdr)
+{
+
+   /* Is logging enabled? */
+   if (tcp_log_in_vain == 0)
+   return (NULL);
+
+   return (tcp_log_addr(inc, th, ip4hdr, ip6hdr));
+}
+
+char *
 tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
 const void *ip6hdr)
 {
+
+   /* Is logging enabled? */
+   if (tcp_log_debug == 0)
+   return (NULL);
+
+   return (tcp_log_addr(inc, th, ip4hdr, ip6hdr));
+}
+
+static char *
+tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
+const void *ip6hdr)
+{
char *s, *sp;
size_t size;
struct ip *ip;
@@ -2259,10 +2285,6 @@ tcp_log_addrs(struct in_conninfo *inc, s
2 * INET_ADDRSTRLEN;
 #endif /* INET6 */
 
-   /* Is logging enabled? */
-   if (tcp_log_debug == 0 && tcp_log_in_vain == 0)
-   return (NULL);
-
s = malloc(size, M_TCPLOG, M_ZERO|M_NOWAIT);
if (s == NULL)
return (NULL);

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Wed Aug 18 15:58:26 2010(r211461)
+++ head/sys/netinet/tcp_var.h  Wed Aug 18 17:39:47 2010(r211462)
@@ -611,6 +611,8 @@ void tcp_destroy(void);
 voidtcp_fini(void *);
 char   *tcp_log_addrs(struct in_conninfo *, struct tcphdr *, void *,
const void *);
+char   *tcp_log_vain(struct in_conninfo *, struct tcphdr *, void *,
+   const void *);
 int tcp_reass(struct tcpcb *, struct tcphdr *, int *, struct mbuf *);
 voidtcp_reass_init(void);
 #ifdef VIMAGE
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211464 - head/sys/netinet

2010-08-18 Thread Andre Oppermann
Author: andre
Date: Wed Aug 18 18:05:54 2010
New Revision: 211464
URL: http://svn.freebsd.org/changeset/base/211464

Log:
  If a TCP connection has been idle for one retransmit timeout or more
  it must reset its congestion window back to the initial window.
  
  RFC3390 has increased the initial window from 1 segment to up to
  4 segments.
  
  The initial window increase of RFC3390 wasn't reflected into the
  restart window which remained at its original defaults of 4 segments
  for local and 1 segment for all other connections.  Both values are
  controllable through sysctl net.inet.tcp.local_slowstart_flightsize
  and net.inet.tcp.slowstart_flightsize.
  
  The increase helps TCP's slow start algorithm to open up the congestion
  window much faster.
  
  Reviewed by:  lstewart
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Wed Aug 18 17:40:10 2010
(r211463)
+++ head/sys/netinet/tcp_output.c   Wed Aug 18 18:05:54 2010
(r211464)
@@ -140,7 +140,7 @@ tcp_output(struct tcpcb *tp)
 {
struct socket *so = tp->t_inpcb->inp_socket;
long len, recwin, sendwin;
-   int off, flags, error;
+   int off, flags, error, rw;
struct mbuf *m;
struct ip *ip = NULL;
struct ipovly *ipov = NULL;
@@ -176,23 +176,34 @@ tcp_output(struct tcpcb *tp)
idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) {
/*
-* We have been idle for "a while" and no acks are
-* expected to clock out any data we send --
-* slow start to get ack "clock" running again.
+* If we've been idle for more than one retransmit
+* timeout the old congestion window is no longer
+* current and we have to reduce it to the restart
+* window before we can transmit again.
 *
-* Set the slow-start flight size depending on whether
-* this is a local network or not.
+* The restart window is the initial window or the last
+* CWND, whichever is smaller.
+* 
+* This is done to prevent us from flooding the path with
+* a full CWND at wirespeed, overloading router and switch
+* buffers along the way.
+*
+* See RFC5681 Section 4.1. "Restarting Idle Connections".
 */
-   int ss = V_ss_fltsz;
+   if (V_tcp_do_rfc3390)
+   rw = min(4 * tp->t_maxseg,
+max(2 * tp->t_maxseg, 4380));
 #ifdef INET6
-   if (isipv6) {
-   if (in6_localaddr(&tp->t_inpcb->in6p_faddr))
-   ss = V_ss_fltsz_local;
-   } else
-#endif /* INET6 */
-   if (in_localaddr(tp->t_inpcb->inp_faddr))
-   ss = V_ss_fltsz_local;
-   tp->snd_cwnd = tp->t_maxseg * ss;
+   else if ((isipv6 ? in6_localaddr(&tp->t_inpcb->in6p_faddr) :
+ in_localaddr(tp->t_inpcb->inp_faddr)))
+#else
+   else if (in_localaddr(tp->t_inpcb->inp_faddr))
+#endif
+   rw = V_ss_fltsz_local * tp->t_maxseg;
+   else
+   rw = V_ss_fltsz * tp->t_maxseg;
+
+   tp->snd_cwnd = min(rw, tp->snd_cwnd);
}
tp->t_flags &= ~TF_LASTIDLE;
if (idle) {

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Wed Aug 18 17:40:10 2010(r211463)
+++ head/sys/netinet/tcp_var.h  Wed Aug 18 18:05:54 2010(r211464)
@@ -565,6 +565,7 @@ extern  int tcp_log_in_vain;
 VNET_DECLARE(int, tcp_mssdflt);/* XXX */
 VNET_DECLARE(int, tcp_minmss);
 VNET_DECLARE(int, tcp_delack_enabled);
+VNET_DECLARE(int, tcp_do_rfc3390);
 VNET_DECLARE(int, tcp_do_newreno);
 VNET_DECLARE(int, path_mtu_discovery);
 VNET_DECLARE(int, ss_fltsz);
@@ -575,6 +576,7 @@ VNET_DECLARE(int, ss_fltsz_local);
 #defineV_tcp_mssdflt   VNET(tcp_mssdflt)
 #defineV_tcp_minmssVNET(tcp_minmss)
 #defineV_tcp_delack_enabledVNET(tcp_delack_enabled)
+#defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390)
 #defineV_tcp_do_newrenoVNET(tcp_do_newreno)
 #defineV_path_mtu_discoveryVNET(path_mtu_discovery)
 #defineV_ss_fltsz  VNET(ss_fltsz)
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubs

Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 13:53, Adrian Chadd wrote:

Author: adrian
Date: Thu Aug 19 11:53:55 2010
New Revision: 211503
URL: http://svn.freebsd.org/changeset/base/211503

Log:
   Add some initial AR724X chipset support.

   This is untested but should at least allow an AR724X to boot.


Isn't this something that should be done on a project branch and
merged back when in a good working state?


   The current code is lacking the detail needed to expose the PCIe bus.
   It is also lacking any NIC, PLL or flush/WB code.


--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 19:20, M. Warner Losh wrote:

In message:<4c6d2933.9020...@freebsd.org>
     Andre Oppermann  writes:
: On 19.08.2010 13:53, Adrian Chadd wrote:
:>  Author: adrian
:>  Date: Thu Aug 19 11:53:55 2010
:>  New Revision: 211503
:>  URL: http://svn.freebsd.org/changeset/base/211503
:>
:>  Log:
:> Add some initial AR724X chipset support.
:>
:> This is untested but should at least allow an AR724X to boot.
:
: Isn't this something that should be done on a project branch and
: merged back when in a good working state?

We don't have a branch for mips stuff these days.  This stuff is OK,
since the AR724X is just being rolled out right now...  For non AR724x
systems, this won't affect anything...


I was more concerned about tree breakage for non-tested code.  When
developing something bleeding edge it is often useful to just commit
some stuff and have it sorted out later.  In head this is more
dangerous.  A small AR724X development branch would be ideal for
this.  Branching is cheap with SVN these days.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 20:42, M. Warner Losh wrote:

In message:<4c6d6fd7.7060...@freebsd.org>
     Andre Oppermann  writes:
: On 19.08.2010 19:20, M. Warner Losh wrote:
:>  In message:<4c6d2933.9020...@freebsd.org>
:>       Andre Oppermann   writes:
:>  : On 19.08.2010 13:53, Adrian Chadd wrote:
:>  :>   Author: adrian
:>  :>   Date: Thu Aug 19 11:53:55 2010
:>  :>   New Revision: 211503
:>  :>   URL: http://svn.freebsd.org/changeset/base/211503
:>  :>
:>  :>   Log:
:>  :>  Add some initial AR724X chipset support.
:>  :>
:>  :>  This is untested but should at least allow an AR724X to boot.
:>  :
:>  : Isn't this something that should be done on a project branch and
:>  : merged back when in a good working state?
:>
:>  We don't have a branch for mips stuff these days.  This stuff is OK,
:>  since the AR724X is just being rolled out right now...  For non AR724x
:>  systems, this won't affect anything...
:
: I was more concerned about tree breakage for non-tested code.  When
: developing something bleeding edge it is often useful to just commit
: some stuff and have it sorted out later.  In head this is more
: dangerous.  A small AR724X development branch would be ideal for
: this.  Branching is cheap with SVN these days.

Merging isn't that cheap with svn.  The svn:mergeinfo properties make
them a pita.  Given that this code won't break anything, except
possibly the now-unsupported AR724x, I think a branch would be
overkill.  We'd have to drag that branch along all the time until we
can get actual hardware to test it on, which is a high overhead.


Didn't know that branching and merging isn't that easy with SVN after
all.  This was one of the supposed benefits for switching from CVS.
If there is no risk of head breakage I don't mind at all.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r211874 - head/sys/netinet

2010-08-27 Thread Andre Oppermann
Author: andre
Date: Fri Aug 27 12:34:53 2010
New Revision: 211874
URL: http://svn.freebsd.org/changeset/base/211874

Log:
  Use timestamp modulo comparison macro for automatic receive buffer
  scaling to correctly handle wrapping of ticks value.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cFri Aug 27 11:08:11 2010
(r211873)
+++ head/sys/netinet/tcp_input.cFri Aug 27 12:34:53 2010
(r211874)
@@ -1441,7 +1441,7 @@ tcp_do_segment(struct mbuf *m, struct tc
if (V_tcp_do_autorcvbuf &&
to.to_tsecr &&
(so->so_rcv.sb_flags & SB_AUTOSIZE)) {
-   if (to.to_tsecr > tp->rfbuf_ts &&
+   if (TSTMP_GT(to.to_tsecr, tp->rfbuf_ts) &&
to.to_tsecr - tp->rfbuf_ts < hz) {
if (tp->rfbuf_cnt >
(so->so_rcv.sb_hiwat / 8 * 7) &&
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r212653 - head/sys/netinet

2010-09-15 Thread Andre Oppermann
Author: andre
Date: Wed Sep 15 10:39:30 2010
New Revision: 212653
URL: http://svn.freebsd.org/changeset/base/212653

Log:
  Change the default MSS for IPv4 and IPv6 TCP connections from an
  artificial power-of-2 rounded number to their real values specified
  in RFC879 and RFC2460.
  
  From the history and existing comments it appears that the rounded
  numbers were intended to be advantageous for the kernel and mbuf
  system.  However this hasn't been the case at for at least a long
  time.  The mbuf clusters used in tcp_output() have enough space
  to hold the larger real value for the default MSS for both IPv4 and
  IPv6.  Note that the default MSS is only used when path MTU discovery
  is disabled.
  
  Update and expand related comments.
  
  Reviewed by:  lsteward (including some word-smithing)
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp.h

Modified: head/sys/netinet/tcp.h
==
--- head/sys/netinet/tcp.h  Wed Sep 15 10:39:21 2010(r212652)
+++ head/sys/netinet/tcp.h  Wed Sep 15 10:39:30 2010(r212653)
@@ -103,29 +103,37 @@ struct tcphdr {
 
 
 /*
- * Default maximum segment size for TCP.
- * With an IP MTU of 576, this is 536,
- * but 512 is probably more convenient.
- * This should be defined as MIN(512, IP_MSS - sizeof (struct tcpiphdr)).
- */
-#defineTCP_MSS 512
-/*
- * TCP_MINMSS is defined to be 216 which is fine for the smallest
- * link MTU (256 bytes, AX.25 packet radio) in the Internet.
- * However it is very unlikely to come across such low MTU interfaces
- * these days (anno dato 2003).
- * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments.
- * Setting this to "0" disables the minmss check.
+ * The default maximum segment size (MSS) to be used for new TCP connections
+ * when path MTU discovery is not enabled.
+ *
+ * RFC879 derives the default MSS from the largest datagram size hosts are
+ * minimally required to handle directly or through IP reassembly minus the
+ * size of the IP and TCP header.  With IPv6 the minimum MTU is specified
+ * in RFC2460.
+ *
+ * For IPv4 the MSS is 576 - sizeof(struct tcpiphdr)
+ * For IPv6 the MSS is IPV6_MMTU - sizeof(struct ip6_hdr) - sizeof(struct 
tcphdr)
+ *
+ * We use explicit numerical definition here to avoid header pollution.
  */
-#defineTCP_MINMSS 216
+#defineTCP_MSS 536
+#defineTCP6_MSS1220
 
 /*
- * Default maximum segment size for TCP6.
- * With an IP6 MSS of 1280, this is 1220,
- * but 1024 is probably more convenient. (xxx kazu in doubt)
- * This should be defined as MIN(1024, IP6_MSS - sizeof (struct tcpip6hdr))
+ * Limit the lowest MSS we accept from path MTU discovery and the TCP SYN MSS
+ * option.  Allowing too low values of MSS can consume significant amounts of
+ * resources and be used as a form of a resource exhaustion attack.
+ * Connections requesting lower MSS values will be rounded up to this value
+ * and the IP_DF flag is cleared to allow fragmentation along the path.
+ *
+ * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments.  Setting
+ * it to "0" disables the minmss check.
+ *
+ * The default value is fine for the smallest official link MTU (256 bytes,
+ * AX.25 packet radio) in the Internet.  However it is very unlikely to come
+ * across such low MTU interfaces these days (anno domini 2003).
  */
-#defineTCP6_MSS1024
+#defineTCP_MINMSS 216
 
 #defineTCP_MAXWIN  65535   /* largest value for (unscaled) window 
*/
 #defineTTCP_CLIENT_SND_WND 4096/* dflt send window for T/TCP 
client */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r212653 - head/sys/netinet

2010-09-15 Thread Andre Oppermann

On 15.09.2010 13:51, Lawrence Stewart wrote:

On 09/15/10 20:39, Andre Oppermann wrote:

Author: andre
Date: Wed Sep 15 10:39:30 2010
New Revision: 212653
URL: http://svn.freebsd.org/changeset/base/212653

Log:
   Change the default MSS for IPv4 and IPv6 TCP connections from an
   artificial power-of-2 rounded number to their real values specified
   in RFC879 and RFC2460.

   From the history and existing comments it appears that the rounded
   numbers were intended to be advantageous for the kernel and mbuf
   system.  However this hasn't been the case at for at least a long
   time.  The mbuf clusters used in tcp_output() have enough space
   to hold the larger real value for the default MSS for both IPv4 and
   IPv6.  Note that the default MSS is only used when path MTU discovery
   is disabled.

   Update and expand related comments.

   Reviewed by: lsteward (including some word-smithing)


For the record, I reviewed and fully support the functional changes made
by this patch, but explicitly objected to and offered an alternate for
the proposed comment wording changes.

Andre, given that we had a disagreement about the comment wording, I
would have preferred it if you had noted in your commit log that I had
raised an objection to or at least not reviewed/endorsed the comment
changes.


I've adapted many of your suggestions on the wording compared to my
first version.  For some parts I felt that my wording/description was
more appropriate.  In the end neither of our wordings is plain wrong or
factually incorrect.


It's not important enough an issue to spend any more time on, but I'm a
bit upset to see this committed with an acknowledgement to my review and
word-smithing, much of which ended up being ignored (which is fine, but
then don't put my name to it).


I apologize for not having made your different opinion to the wording
clear enough in the commit message.  My intent was to communicate that
you not only reviewed the functional change but also provided input on
the wording (which I in fact did not incorporate to some extent but not
entirely).

Below is the wording proposed by Lawrence:
/*
 * The default Maximum Segment Size (MSS) to use when we do not have specific
 * knowledge (e.g. via path MTU discovery) that the destination host is prepared
 * to accept larger datagrams. The smallest allowable IP datagram MTU and
 * optionless IP/TCP header lengths are used for the calculation as per RFC879.
 * For IPv4 (RFC791): 576 - 20 - 20 = 536.
 * For IPv6 (RFC2460): 1280 - 40 - 20 = 1220.
 */
#define TCP_MSS 536
#define TCP6_MSS1220

 * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS
 * option. Allowing low values of MSS can consume significant resources and be
 * used to mount a resource exhaustion attack. Connections requesting lower MSS
 * values will be rounded up to this value and the IP_DF flag will be cleared to
 * allow fragmentation along the path.
 *
 * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting this
 * SYSCTL to "0" disables the minmss check.
 *
 * The default value is fine for TCP over IPv4 across the Internet's smallest
 * known link MTU (256 bytes for AX.25 packet radio). However, a connection is
 * very unlikely to come across such low MTU interfaces (anno domini 2003).
 */
#define TCP_MINMSS 216

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r212653 - head/sys/netinet

2010-09-16 Thread Andre Oppermann

On 15.09.2010 18:12, John Baldwin wrote:

On Wednesday, September 15, 2010 10:04:45 am Andre Oppermann wrote:

Below is the wording proposed by Lawrence:
/*
   * The default Maximum Segment Size (MSS) to use when we do not have specific
   * knowledge (e.g. via path MTU discovery) that the destination host is 
prepared
   * to accept larger datagrams. The smallest allowable IP datagram MTU and
   * optionless IP/TCP header lengths are used for the calculation as per 
RFC879.
   * For IPv4 (RFC791): 576 - 20 - 20 = 536.
   * For IPv6 (RFC2460): 1280 - 40 - 20 = 1220.
   */
#define TCP_MSS 536
#define TCP6_MSS1220


I think the existing text is fine for this comment, with one nit:

  * For IPv4 the MSS is 576 - sizeof(struct tcpiphdr)

I would find it clearer if it was 'sizeof(struct ip) - sizeof(struct tcphdr)'
instead.


I chose 'sizeof(struct tcpiphdr)' for consistency with other parts of
the TCP code where the MSS is calculated this way.  'struct tcpiphdr' predates
IPv6 and is commonly used in the BSD kernel code.


   * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS
   * option. Allowing low values of MSS can consume significant resources and be
   * used to mount a resource exhaustion attack. Connections requesting lower 
MSS
   * values will be rounded up to this value and the IP_DF flag will be cleared 
to
   * allow fragmentation along the path.
   *
   * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments. Setting 
this
   * SYSCTL to "0" disables the minmss check.
   *
   * The default value is fine for TCP over IPv4 across the Internet's smallest
   * known link MTU (256 bytes for AX.25 packet radio). However, a connection is
   * very unlikely to come across such low MTU interfaces (anno domini 2003).
   */
#define TCP_MINMSS 216


I actually prefer the above text for this block.  The 'amounts of resources'
phrase is certainly redundant and just 'resources' is clearer.


OK.  I'll update the comment with a small change to the third paragraph.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r212731 - head/sys/netinet

2010-09-16 Thread Andre Oppermann
Author: andre
Date: Thu Sep 16 12:13:06 2010
New Revision: 212731
URL: http://svn.freebsd.org/changeset/base/212731

Log:
  Improve comment to TCP_MINMSS by taking the wording from lstewart (with
  a small difference in the last paragraph though) as suggested by jhb.
  
  Clarify that the 'reviewed by' in r212653 by lstewart was for the
  functional change, not the comments in the committed version.

Modified:
  head/sys/netinet/tcp.h

Modified: head/sys/netinet/tcp.h
==
--- head/sys/netinet/tcp.h  Thu Sep 16 12:05:46 2010(r212730)
+++ head/sys/netinet/tcp.h  Thu Sep 16 12:13:06 2010(r212731)
@@ -120,18 +120,18 @@ struct tcphdr {
 #defineTCP6_MSS1220
 
 /*
- * Limit the lowest MSS we accept from path MTU discovery and the TCP SYN MSS
- * option.  Allowing too low values of MSS can consume significant amounts of
- * resources and be used as a form of a resource exhaustion attack.
+ * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS
+ * option.  Allowing low values of MSS can consume significant resources and
+ * be used to mount a resource exhaustion attack.
  * Connections requesting lower MSS values will be rounded up to this value
- * and the IP_DF flag is cleared to allow fragmentation along the path.
+ * and the IP_DF flag will be cleared to allow fragmentation along the path.
  *
  * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments.  Setting
  * it to "0" disables the minmss check.
  *
- * The default value is fine for the smallest official link MTU (256 bytes,
- * AX.25 packet radio) in the Internet.  However it is very unlikely to come
- * across such low MTU interfaces these days (anno domini 2003).
+ * The default value is fine for TCP across the Internet's smallest official
+ * link MTU (256 bytes for AX.25 packet radio).  However, a connection is very
+ * unlikely to come across such low MTU interfaces these days (anno domini 
2003).
  */
 #defineTCP_MINMSS 216
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r212765 - head/sys/netinet

2010-09-16 Thread Andre Oppermann
Author: andre
Date: Thu Sep 16 21:06:45 2010
New Revision: 212765
URL: http://svn.freebsd.org/changeset/base/212765

Log:
  Remove the TCP inflight bandwidth limiter as announced in r211315
  to give way for the pluggable congestion control framework.  It is
  the task of the congestion control algorithm to set the congestion
  window and amount of inflight data without external interference.
  
  In 'struct tcpcb' the variables previously used by the inflight
  limiter are renamed to spares to keep the ABI intact and to have
  some more space for future extensions.
  
  In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
  preserve the ABI.  It is always set to 0.
  
  In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
  to preserve the ABI.  It is always set to 0.
  
  These unused variable in the various structures may be reused in the
  future or garbage collected before the next release or at some other
  point when an ABI change happens anyway for other reasons.
  
  No MFC is planned.  The inflight bandwidth limiter stays disabled by
  default in the other branches but remains available.

Modified:
  head/sys/netinet/siftr.c
  head/sys/netinet/tcp.h
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c
  head/sys/netinet/tcp_timer.h
  head/sys/netinet/tcp_usrreq.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/siftr.c
==
--- head/sys/netinet/siftr.cThu Sep 16 21:06:23 2010(r212764)
+++ head/sys/netinet/siftr.cThu Sep 16 21:06:45 2010(r212765)
@@ -193,7 +193,7 @@ struct pkt_node {
u_long  snd_wnd;
/* Receive Window (bytes). */
u_long  rcv_wnd;
-   /* Bandwidth Controlled Window (bytes). */
+   /* Unused (was: Bandwidth Controlled Window (bytes)). */
u_long  snd_bwnd;
/* Slow Start Threshold (bytes). */
u_long  snd_ssthresh;
@@ -775,7 +775,7 @@ siftr_siftdata(struct pkt_node *pn, stru
pn->snd_cwnd = tp->snd_cwnd;
pn->snd_wnd = tp->snd_wnd;
pn->rcv_wnd = tp->rcv_wnd;
-   pn->snd_bwnd = tp->snd_bwnd;
+   pn->snd_bwnd = 0;   /* Unused, kept for compat. */
pn->snd_ssthresh = tp->snd_ssthresh;
pn->snd_scale = tp->snd_scale;
pn->rcv_scale = tp->rcv_scale;

Modified: head/sys/netinet/tcp.h
==
--- head/sys/netinet/tcp.h  Thu Sep 16 21:06:23 2010(r212764)
+++ head/sys/netinet/tcp.h  Thu Sep 16 21:06:45 2010(r212765)
@@ -221,7 +221,7 @@ struct tcp_info {
 
/* FreeBSD extensions to tcp_info. */
u_int32_t   tcpi_snd_wnd;   /* Advertised send window. */
-   u_int32_t   tcpi_snd_bwnd;  /* Bandwidth send window. */
+   u_int32_t   tcpi_snd_bwnd;  /* No longer used. */
u_int32_t   tcpi_snd_nxt;   /* Next egress seqno */
u_int32_t   tcpi_rcv_nxt;   /* Next ingress seqno */
u_int32_t   tcpi_toe_tid;   /* HWTID for TOE endpoints */

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cThu Sep 16 21:06:23 2010
(r212764)
+++ head/sys/netinet/tcp_input.cThu Sep 16 21:06:45 2010
(r212765)
@@ -1321,7 +1321,6 @@ tcp_do_segment(struct mbuf *m, struct tc
tcp_xmit_timer(tp,
ticks - tp->t_rtttime);
}
-   tcp_xmit_bandwidth_limit(tp, th->th_ack);
acked = th->th_ack - tp->snd_una;
TCPSTAT_INC(tcps_rcvackpack);
TCPSTAT_ADD(tcps_rcvackbyte, acked);
@@ -2278,7 +2277,6 @@ process_ACK:
tp->t_rttlow = ticks - tp->t_rtttime;
tcp_xmit_timer(tp, ticks - tp->t_rtttime);
}
-   tcp_xmit_bandwidth_limit(tp, th->th_ack);
 
/*
 * If all outstanding data is acked, stop retransmit
@@ -3328,8 +3326,6 @@ tcp_mss(struct tcpcb *tp, int offer)
tp->snd_ssthresh = max(2 * mss, metrics.rmx_ssthresh);
TCPSTAT_INC(tcps_usedssthresh);
}
-   if (metrics.rmx_bandwidth)
-   tp->snd_bandwidth = metrics.rmx_bandwidth;
 
/*
 * Set the slow-start flight size depending on whether this

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Thu Sep 16 21:06:23 2010
(r212764)
+++ head/sys/netinet/tcp_output.c   Thu 

svn commit: r212769 - head/share/man/man4

2010-09-16 Thread Andre Oppermann
Author: andre
Date: Thu Sep 16 22:11:55 2010
New Revision: 212769
URL: http://svn.freebsd.org/changeset/base/212769

Log:
  The inflight bandwidth limiter was removed in r212765.

Modified:
  head/share/man/man4/tcp.4

Modified: head/share/man/man4/tcp.4
==
--- head/share/man/man4/tcp.4   Thu Sep 16 21:18:25 2010(r212768)
+++ head/share/man/man4/tcp.4   Thu Sep 16 22:11:55 2010(r212769)
@@ -32,7 +32,7 @@
 .\" From: @(#)tcp.48.1 (Berkeley) 6/5/93
 .\" $FreeBSD$
 .\"
-.Dd August 16, 2008
+.Dd September 16, 2010
 .Dt TCP 4
 .Os
 .Sh NAME
@@ -383,72 +383,6 @@ code.
 For this reason, we use 200ms of slop and a near-0
 minimum, which gives us an effective minimum of 200ms (similar to
 .Tn Linux ) .
-.It Va inflight.enable
-Enable
-.Tn TCP
-bandwidth-delay product limiting.
-An attempt will be made to calculate
-the bandwidth-delay product for each individual
-.Tn TCP
-connection, and limit
-the amount of inflight data being transmitted, to avoid building up
-unnecessary packets in the network.
-This option is recommended if you
-are serving a lot of data over connections with high bandwidth-delay
-products, such as modems, GigE links, and fast long-haul WANs, and/or
-you have configured your machine to accommodate large
-.Tn TCP
-windows.
-In such
-situations, without this option, you may experience high interactive
-latencies or packet loss due to the overloading of intermediate routers
-and switches.
-Note that bandwidth-delay product limiting only effects
-the transmit side of a
-.Tn TCP
-connection.
-.It Va inflight.debug
-Enable debugging for the bandwidth-delay product algorithm.
-.It Va inflight.min
-This puts a lower bound on the bandwidth-delay product window, in bytes.
-A value of 1024 is typically used for debugging.
-6000-16000 is more typical in a production installation.
-Setting this value too low may result in
-slow ramp-up times for bursty connections.
-Setting this value too high effectively disables the algorithm.
-.It Va inflight.max
-This puts an upper bound on the bandwidth-delay product window, in bytes.
-This value should not generally be modified, but may be used to set a
-global per-connection limit on queued data, potentially allowing you to
-intentionally set a less than optimum limit, to smooth data flow over a
-network while still being able to specify huge internal
-.Tn TCP
-buffers.
-.It Va inflight.stab
-The bandwidth-delay product algorithm requires a slightly larger window
-than it otherwise calculates for stability.
-This parameter determines the extra window in maximal packets / 10.
-The default value of 20 represents 2 maximal packets.
-Reducing this value is not recommended, but you may
-come across a situation with very slow links where the
-.Xr ping 8
-time
-reduction of the default inflight code is not sufficient.
-If this case occurs, you should first try reducing
-.Va inflight.min
-and, if that does not
-work, reduce both
-.Va inflight.min
-and
-.Va inflight.stab ,
-trying values of
-15, 10, or 5 for the latter.
-Never use a value less than 5.
-Reducing
-.Va inflight.stab
-can lead to upwards of a 20% underutilization of the link
-as well as reducing the algorithm's ability to adapt to changing
-situations and should only be done as a last resort.
 .It Va rfc3042
 Enable the Limited Transmit algorithm as described in RFC 3042.
 It helps avoid timeouts on lossy links and also when the congestion window
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r212803 - head/sys/netinet

2010-09-17 Thread Andre Oppermann
Author: andre
Date: Fri Sep 17 22:05:27 2010
New Revision: 212803
URL: http://svn.freebsd.org/changeset/base/212803

Log:
  Rearrange the TSO code to make it more readable and to clearly
  separate the decision logic, of whether we can do TSO, and the
  calculation of the burst length into two distinct parts.
  
  Change the way the TSO burst length calculation is done. While
  TSO could do bursts of 65535 bytes that can't be represented in
  ip_len together with the IP and TCP header. Account for that and
  use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
  have the same value of 64K). When more data is available prevent
  less than MSS sized segments from being sent during the current
  TSO burst.
  
  Add two more KASSERTs to ensure the integrity of the packets.
  
  Tested by:Ben Wilber 
  MFC after:10 days

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Fri Sep 17 21:53:56 2010
(r212802)
+++ head/sys/netinet/tcp_output.c   Fri Sep 17 22:05:27 2010
(r212803)
@@ -465,9 +465,8 @@ after_sack_rexmit:
}
 
/*
-* Truncate to the maximum segment length or enable TCP Segmentation
-* Offloading (if supported by hardware) and ensure that FIN is removed
-* if the length no longer contains the last data byte.
+* Decide if we can use TCP Segmentation Offloading (if supported by
+* hardware).
 *
 * TSO may only be used if we are in a pure bulk sending state.  The
 * presence of TCP-MD5, SACK retransmits, SACK advertizements and
@@ -475,10 +474,6 @@ after_sack_rexmit:
 * (except for the sequence number) for all generated packets.  This
 * makes it impossible to transmit any options which vary per generated
 * segment or packet.
-*
-* The length of TSO bursts is limited to TCP_MAXWIN.  That limit and
-* removal of FIN (if not already catched here) are handled later after
-* the exact length of the TCP options are known.
 */
 #ifdef IPSEC
/*
@@ -487,22 +482,15 @@ after_sack_rexmit:
 */
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
 #endif
-   if (len > tp->t_maxseg) {
-   if ((tp->t_flags & TF_TSO) && V_tcp_do_tso &&
-   ((tp->t_flags & TF_SIGNATURE) == 0) &&
-   tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
-   tp->t_inpcb->inp_options == NULL &&
-   tp->t_inpcb->in6p_options == NULL
+   if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg &&
+   ((tp->t_flags & TF_SIGNATURE) == 0) &&
+   tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
 #ifdef IPSEC
-   && ipsec_optlen == 0
+   ipsec_optlen == 0 &&
 #endif
-   ) {
-   tso = 1;
-   } else {
-   len = tp->t_maxseg;
-   sendalot = 1;
-   }
-   }
+   tp->t_inpcb->inp_options == NULL &&
+   tp->t_inpcb->in6p_options == NULL)
+   tso = 1;
 
if (sack_rxmit) {
if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
@@ -732,28 +720,53 @@ send:
 * bump the packet length beyond the t_maxopd length.
 * Clear the FIN bit because we cut off the tail of
 * the segment.
-*
-* When doing TSO limit a burst to TCP_MAXWIN minus the
-* IP, TCP and Options length to keep ip->ip_len from
-* overflowing.  Prevent the last segment from being
-* fractional thus making them all equal sized and set
-* the flag to continue sending.  TSO is disabled when
-* IP options or IPSEC are present.
 */
if (len + optlen + ipoptlen > tp->t_maxopd) {
flags &= ~TH_FIN;
+
if (tso) {
-   if (len > TCP_MAXWIN - hdrlen - optlen) {
-   len = TCP_MAXWIN - hdrlen - optlen;
-   len = len - (len % (tp->t_maxopd - optlen));
+   KASSERT(ipoptlen == 0,
+   ("%s: TSO can't do IP options", __func__));
+
+   /*
+* Limit a burst to IP_MAXPACKET minus IP,
+* TCP and options length to keep ip->ip_len
+* from overflowing.
+*/
+   if (len > IP_MAXPACKET - hdrlen) {
+   len = IP_MAXPACKET - hdrlen;
+   sendalot = 1;
+   }
+
+   /*
+* Prevent the last segment from being
+* fractional unless the send sockbuf can
+* be emptied.
+*/
+

Re: svn commit: r212803 - head/sys/netinet

2010-09-18 Thread Andre Oppermann

On 18.09.2010 13:34, Bjoern A. Zeeb wrote:

On Fri, 17 Sep 2010, Andre Oppermann wrote:

@@ -487,22 +482,15 @@ after_sack_rexmit:
*/
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
#endif
- if (len > tp->t_maxseg) {
- if ((tp->t_flags & TF_TSO) && V_tcp_do_tso &&
- ((tp->t_flags & TF_SIGNATURE) == 0) &&
- tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
- tp->t_inpcb->inp_options == NULL &&
- tp->t_inpcb->in6p_options == NULL
+ if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg &&
+ ((tp->t_flags & TF_SIGNATURE) == 0) &&
+ tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
#ifdef IPSEC
- && ipsec_optlen == 0
+ ipsec_optlen == 0 &&
#endif
- ) {
- tso = 1;
- } else {
- len = tp->t_maxseg;
- sendalot = 1;
- }
- }
+ tp->t_inpcb->inp_options == NULL &&
+ tp->t_inpcb->in6p_options == NULL)
+ tso = 1;


In the non-TSO case you are no longer reducing len to tp->t_maxseg
here, if it's larger, which I think breaks asssumptions all the way down.


No assumptions are broken for the non-TSO case.  The value of len is
only tested against t_maxseg for being equal or grater.  This always
hold true.  When the decision to send has been made len is correctly
limited in the non-TSO and TSO case.  Before it was a bit of either
was done in both places.  That is now merged into one spot.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r212803 - head/sys/netinet

2010-10-23 Thread Andre Oppermann

On 23.10.2010 15:10, Bjoern A. Zeeb wrote:

On Fri, 17 Sep 2010, Andre Oppermann wrote:


Author: andre
Date: Fri Sep 17 22:05:27 2010
New Revision: 212803
URL: http://svn.freebsd.org/changeset/base/212803

Log:
Rearrange the TSO code to make it more readable and to clearly
separate the decision logic, of whether we can do TSO, and the
calculation of the burst length into two distinct parts.

Change the way the TSO burst length calculation is done. While
TSO could do bursts of 65535 bytes that can't be represented in
ip_len together with the IP and TCP header. Account for that and
use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
have the same value of 64K). When more data is available prevent
less than MSS sized segments from being sent during the current
TSO burst.

Add two more KASSERTs to ensure the integrity of the packets.

Tested by: Ben Wilber 
MFC after: 10 days


As this hasn't happned yet, please do not do. It breaks things. I'll
follow-up later as soon as I have more details.


I was busied out after the EuroBSDCon DevSummit and didn't have have
time to MFC.  Incidentially I was planning on doing it today, but will
hold off based on your request.

The version currently in 8 certainly has a bug.  For the one in head
you are the first report.  Others reported their all their issues to be
fixed with this patch.

Can you give an high level description of the problem you are seeing?
A detailed description is not required to take a first look on whatever
issue you may have.

--
Andre




Modified:
head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c Fri Sep 17 21:53:56 2010 (r212802)
+++ head/sys/netinet/tcp_output.c Fri Sep 17 22:05:27 2010 (r212803)
@@ -465,9 +465,8 @@ after_sack_rexmit:
}

/*
- * Truncate to the maximum segment length or enable TCP Segmentation
- * Offloading (if supported by hardware) and ensure that FIN is removed
- * if the length no longer contains the last data byte.
+ * Decide if we can use TCP Segmentation Offloading (if supported by
+ * hardware).
*
* TSO may only be used if we are in a pure bulk sending state. The
* presence of TCP-MD5, SACK retransmits, SACK advertizements and
@@ -475,10 +474,6 @@ after_sack_rexmit:
* (except for the sequence number) for all generated packets. This
* makes it impossible to transmit any options which vary per generated
* segment or packet.
- *
- * The length of TSO bursts is limited to TCP_MAXWIN. That limit and
- * removal of FIN (if not already catched here) are handled later after
- * the exact length of the TCP options are known.
*/
#ifdef IPSEC
/*
@@ -487,22 +482,15 @@ after_sack_rexmit:
*/
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
#endif
- if (len > tp->t_maxseg) {
- if ((tp->t_flags & TF_TSO) && V_tcp_do_tso &&
- ((tp->t_flags & TF_SIGNATURE) == 0) &&
- tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
- tp->t_inpcb->inp_options == NULL &&
- tp->t_inpcb->in6p_options == NULL
+ if ((tp->t_flags & TF_TSO) && V_tcp_do_tso && len > tp->t_maxseg &&
+ ((tp->t_flags & TF_SIGNATURE) == 0) &&
+ tp->rcv_numsacks == 0 && sack_rxmit == 0 &&
#ifdef IPSEC
- && ipsec_optlen == 0
+ ipsec_optlen == 0 &&
#endif
- ) {
- tso = 1;
- } else {
- len = tp->t_maxseg;
- sendalot = 1;
- }
- }
+ tp->t_inpcb->inp_options == NULL &&
+ tp->t_inpcb->in6p_options == NULL)
+ tso = 1;

if (sack_rxmit) {
if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
@@ -732,28 +720,53 @@ send:
* bump the packet length beyond the t_maxopd length.
* Clear the FIN bit because we cut off the tail of
* the segment.
- *
- * When doing TSO limit a burst to TCP_MAXWIN minus the
- * IP, TCP and Options length to keep ip->ip_len from
- * overflowing. Prevent the last segment from being
- * fractional thus making them all equal sized and set
- * the flag to continue sending. TSO is disabled when
- * IP options or IPSEC are present.
*/
if (len + optlen + ipoptlen > tp->t_maxopd) {
flags &= ~TH_FIN;
+
if (tso) {
- if (len > TCP_MAXWIN - hdrlen - optlen) {
- len = TCP_MAXWIN - hdrlen - optlen;
- len = len - (len % (tp->t_maxopd - optlen));
+ KASSERT(ipoptlen == 0,
+ ("%s: TSO can't do IP options", __func__));
+
+ /*
+ * Limit a burst to IP_MAXPACKET minus IP,
+ * TCP and options length to keep ip->ip_len
+ * from overflowing.
+ */
+ if (len > IP_MAXPACKET - hdrlen) {
+ len = IP_MAXPACKET - hdrlen;
+ sendalot = 1;
+ }
+
+ /*
+ * Prevent the last segment from being
+ * fractional unless the send sockbuf can
+ * be emptied.
+ */
+ if (sendalot && off + len < so->so_snd.sb_cc) {
+ len -= len % (tp->t_maxopd - optlen);
sendalot = 1;
- } else if (tp->t_flags & TF_NEEDFIN)
+ 

svn commit: r226105 - head/sys/netinet

2011-10-07 Thread Andre Oppermann
Author: andre
Date: Fri Oct  7 13:43:01 2011
New Revision: 226105
URL: http://svn.freebsd.org/changeset/base/226105

Log:
  Add back the IP header length to the total packet length field on
  raw IP sockets.  It was deducted in ip_input() in preparation for
  protocols interested only in the payload.
  
  On raw sockets the IP header should be delivered as it at came in
  from the network except for the byte order swaps in some fields.
  
  This brings us in line with all other OS'es that provide raw
  IP sockets.
  
  Reported by: Matthew Cini Sarreo 
  MFC after: 3 days

Modified:
  head/sys/netinet/raw_ip.c

Modified: head/sys/netinet/raw_ip.c
==
--- head/sys/netinet/raw_ip.c   Fri Oct  7 13:16:21 2011(r226104)
+++ head/sys/netinet/raw_ip.c   Fri Oct  7 13:43:01 2011(r226105)
@@ -289,6 +289,13 @@ rip_input(struct mbuf *m, int off)
last = NULL;
 
ifp = m->m_pkthdr.rcvif;
+   /*
+* Add back the IP header length which was
+* removed by ip_input().  Raw sockets do
+* not modify the packet except for some
+* byte order swaps.
+*/
+   ip->ip_len += off;
 
hash = INP_PCBHASH_RAW(proto, ip->ip_src.s_addr,
ip->ip_dst.s_addr, V_ripcbinfo.ipi_hashmask);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r226113 - head/sys/netinet

2011-10-07 Thread Andre Oppermann
Author: andre
Date: Fri Oct  7 16:39:03 2011
New Revision: 226113
URL: http://svn.freebsd.org/changeset/base/226113

Log:
  Prevent TCP sessions from stalling indefinitely in reassembly
  when reaching the zone limit of reassembly queue entries.
  
  When the zone limit was reached not even the missing segment
  that would complete the sequence space could be processed
  preventing the TCP session forever from making any further
  progress.
  
  Solve this deadlock by using a temporary on-stack queue entry
  for the missing segment followed by an immediate dequeue again
  by delivering the contiguous sequence space to the socket.
  
  Add logging under net.inet.tcp.log_debug for reassembly queue
  issues.
  
  Reviewed by:  lsteward (previous version)
  Tested by:Steven Hartland 
  MFC after:3 days

Modified:
  head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==
--- head/sys/netinet/tcp_reass.cFri Oct  7 16:09:44 2011
(r226112)
+++ head/sys/netinet/tcp_reass.cFri Oct  7 16:39:03 2011
(r226113)
@@ -177,7 +177,9 @@ tcp_reass(struct tcpcb *tp, struct tcphd
struct tseg_qent *nq;
struct tseg_qent *te = NULL;
struct socket *so = tp->t_inpcb->inp_socket;
+   char *s = NULL;
int flags;
+   struct tseg_qent tqs;
 
INP_WLOCK_ASSERT(tp->t_inpcb);
 
@@ -215,19 +217,40 @@ tcp_reass(struct tcpcb *tp, struct tcphd
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+   if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, "%s; %s: queue limit reached, "
+   "segment dropped\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
return (0);
}
 
/*
 * Allocate a new queue entry. If we can't, or hit the zone limit
 * just drop the pkt.
+*
+* Use a temporary structure on the stack for the missing segment
+* when the zone is exhausted. Otherwise we may get stuck.
 */
te = uma_zalloc(V_tcp_reass_zone, M_NOWAIT);
-   if (te == NULL) {
+   if (te == NULL && th->th_seq != tp->rcv_nxt) {
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+   if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, "%s; %s: global zone limit reached, "
+   "segment dropped\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
return (0);
+   } else if (th->th_seq == tp->rcv_nxt) {
+   bzero(&tqs, sizeof(struct tseg_qent));
+   te = &tqs;
+   if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, "%s; %s: global zone limit reached, "
+   "using stack for missing segment\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
}
tp->t_segqlen++;
 
@@ -304,6 +327,8 @@ tcp_reass(struct tcpcb *tp, struct tcphd
if (p == NULL) {
LIST_INSERT_HEAD(&tp->t_segq, te, tqe_q);
} else {
+   KASSERT(te != &tqs, ("%s: temporary stack based entry not "
+   "first element in queue", __func__));
LIST_INSERT_AFTER(p, te, tqe_q);
}
 
@@ -327,7 +352,8 @@ present:
m_freem(q->tqe_m);
else
sbappendstream_locked(&so->so_rcv, q->tqe_m);
-   uma_zfree(V_tcp_reass_zone, q);
+   if (q != &tqs)
+   uma_zfree(V_tcp_reass_zone, q);
tp->t_segqlen--;
q = nq;
} while (q && q->tqe_th->th_seq == tp->rcv_nxt);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r226113 - head/sys/netinet

2011-10-09 Thread Andre Oppermann

Hi Lawrence,

Sorry for jumping in here. There was some urgency felt at EuroBSDCon
to get this issue fixed before the next RC.

--
Andre

On 08.10.2011 02:56, Lawrence Stewart wrote:

Hi Andre and RE team,

I've had a patch sitting in re@'s inbox for this problem since 15th Sep and 
have been waiting for
their go-ahead to commit. The patch I submitted is at:

http://people.freebsd.org/~lstewart/patches/misctcp/tcpreassstackfix_9.x.r225576.diff

The proposed commit message was:

##
Use a backup (stack allocated) struct tseg_qent when we are unable to allocate 
one from the TCP
reassembly UMA zone and the incoming segment is the one we've been waiting for 
(i.e. th_seq ==
rcv_nxt). This avoids TCP connections stalling when the zone limit is reached.

PR: kern/155407
Reported by: Slawa Olhovchenkov and Steven Hartland
Tested by: Steven Hartland
Submitted by: andre
Reviewed by: jhb
Approved by: re (?)
MFC after: 1 week
##

I feel the logging changes should have been committed separately to the fix, 
but other than that,
what you committed achieves the same thing as the patch I proposed.

I should have updated the ML thread to say it was submitted and awaiting 
approval, so you weren't to
know.

Anyhoo, I guess I'll leave it up to you and re@ to sort out how you want to 
proceed, but wanted to
make sure everyone was on the same page as RE would have gotten confused when 
you requested your
patch be MFCed.

Cheers,
Lawrence

On 10/08/11 03:39, Andre Oppermann wrote:

Author: andre
Date: Fri Oct 7 16:39:03 2011
New Revision: 226113
URL: http://svn.freebsd.org/changeset/base/226113

Log:
Prevent TCP sessions from stalling indefinitely in reassembly
when reaching the zone limit of reassembly queue entries.

When the zone limit was reached not even the missing segment
that would complete the sequence space could be processed
preventing the TCP session forever from making any further
progress.

Solve this deadlock by using a temporary on-stack queue entry
for the missing segment followed by an immediate dequeue again
by delivering the contiguous sequence space to the socket.

Add logging under net.inet.tcp.log_debug for reassembly queue
issues.

Reviewed by: lsteward (previous version)
Tested by: Steven Hartland
MFC after: 3 days

Modified:
head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==
--- head/sys/netinet/tcp_reass.c Fri Oct 7 16:09:44 2011 (r226112)
+++ head/sys/netinet/tcp_reass.c Fri Oct 7 16:39:03 2011 (r226113)
@@ -177,7 +177,9 @@ tcp_reass(struct tcpcb *tp, struct tcphd
struct tseg_qent *nq;
struct tseg_qent *te = NULL;
struct socket *so = tp->t_inpcb->inp_socket;
+ char *s = NULL;
int flags;
+ struct tseg_qent tqs;

INP_WLOCK_ASSERT(tp->t_inpcb);

@@ -215,19 +217,40 @@ tcp_reass(struct tcpcb *tp, struct tcphd
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+ if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) {
+ log(LOG_DEBUG, "%s; %s: queue limit reached, "
+ "segment dropped\n", s, __func__);
+ free(s, M_TCPLOG);
+ }
return (0);
}

/*
* Allocate a new queue entry. If we can't, or hit the zone limit
* just drop the pkt.
+ *
+ * Use a temporary structure on the stack for the missing segment
+ * when the zone is exhausted. Otherwise we may get stuck.
*/
te = uma_zalloc(V_tcp_reass_zone, M_NOWAIT);
- if (te == NULL) {
+ if (te == NULL&& th->th_seq != tp->rcv_nxt) {
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+ if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) {
+ log(LOG_DEBUG, "%s; %s: global zone limit reached, "
+ "segment dropped\n", s, __func__);
+ free(s, M_TCPLOG);
+ }
return (0);
+ } else if (th->th_seq == tp->rcv_nxt) {
+ bzero(&tqs, sizeof(struct tseg_qent));
+ te =&tqs;
+ if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) {
+ log(LOG_DEBUG, "%s; %s: global zone limit reached, "
+ "using stack for missing segment\n", s, __func__);
+ free(s, M_TCPLOG);
+ }
}
tp->t_segqlen++;

@@ -304,6 +327,8 @@ tcp_reass(struct tcpcb *tp, struct tcphd
if (p == NULL) {
LIST_INSERT_HEAD(&tp->t_segq, te, tqe_q);
} else {
+ KASSERT(te !=&tqs, ("%s: temporary stack based entry not "
+ "first element in queue", __func__));
LIST_INSERT_AFTER(p, te, tqe_q);
}

@@ -327,7 +352,8 @@ present:
m_freem(q->tqe_m);
else
sbappendstream_locked(&so->so_rcv, q->tqe_m);
- uma_zfree(V_tcp_reass_zone, q);
+ if (q !=&tqs)
+ uma_zfree(V_tcp_reass_zone, q);
tp->t_segqlen--;
q = nq;
} while (q&& q->tqe_th->th_seq == tp->rcv_nxt);






___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r226433 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 13:54:46 2011
New Revision: 226433
URL: http://svn.freebsd.org/changeset/base/226433

Log:
  Update the comment and description of tcp_sendspace and tcp_recvspace
  to better reflect their purpose.
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_usrreq.c

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 11:08:51 2011
(r226432)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 13:54:46 2011
(r226433)
@@ -1498,16 +1498,15 @@ tcp_ctloutput(struct socket *so, struct 
 #undef INP_WLOCK_RECHECK
 
 /*
- * tcp_sendspace and tcp_recvspace are the default send and receive window
- * sizes, respectively.  These are obsolescent (this information should
- * be set by the route).
+ * Set the initial send and receive socket buffer sizes for
+ * newly created TCP sockets.
  */
 u_long tcp_sendspace = 1024*32;
 SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW,
-&tcp_sendspace , 0, "Maximum outgoing TCP datagram size");
+&tcp_sendspace , 0, "Initial send socket buffer size");
 u_long tcp_recvspace = 1024*64;
 SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW,
-&tcp_recvspace , 0, "Maximum incoming TCP datagram size");
+&tcp_recvspace , 0, "Initial receive socket buffer size");
 
 /*
  * Attach TCP protocol to socket, allocating
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r226437 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 15:08:43 2011
New Revision: 226437
URL: http://svn.freebsd.org/changeset/base/226437

Log:
  VNET virtualize tcp_sendspace/tcp_recvspace and change the
  type to INT.  A long is not necessary as the TCP window is
  limited to 2**30.  A larger initial window isn't useful.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_usrreq.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 14:30:28 2011
(r226436)
+++ head/sys/netinet/tcp_input.cSun Oct 16 15:08:43 2011
(r226437)
@@ -3517,7 +3517,7 @@ tcp_mss(struct tcpcb *tp, int offer)
 */
so = inp->inp_socket;
SOCKBUF_LOCK(&so->so_snd);
-   if ((so->so_snd.sb_hiwat == tcp_sendspace) && metrics.rmx_sendpipe)
+   if ((so->so_snd.sb_hiwat == V_tcp_sendspace) && metrics.rmx_sendpipe)
bufsize = metrics.rmx_sendpipe;
else
bufsize = so->so_snd.sb_hiwat;
@@ -3534,7 +3534,7 @@ tcp_mss(struct tcpcb *tp, int offer)
tp->t_maxseg = mss;
 
SOCKBUF_LOCK(&so->so_rcv);
-   if ((so->so_rcv.sb_hiwat == tcp_recvspace) && metrics.rmx_recvpipe)
+   if ((so->so_rcv.sb_hiwat == V_tcp_recvspace) && metrics.rmx_recvpipe)
bufsize = metrics.rmx_recvpipe;
else
bufsize = so->so_rcv.sb_hiwat;

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 14:30:28 2011
(r226436)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 15:08:43 2011
(r226437)
@@ -1501,12 +1501,15 @@ tcp_ctloutput(struct socket *so, struct 
  * Set the initial send and receive socket buffer sizes for
  * newly created TCP sockets.
  */
-u_long tcp_sendspace = 1024*32;
-SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW,
-&tcp_sendspace , 0, "Initial send socket buffer size");
-u_long tcp_recvspace = 1024*64;
-SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW,
-&tcp_recvspace , 0, "Initial receive socket buffer size");
+VNET_DEFINE(int, tcp_sendspace) = 1024*32;
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
+&VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size");
+
+VNET_DEFINE(int, tcp_recvspace) = 1024*64
+#defineV_tcp_recvspace VNET(tcp_recvspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
+&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size");
 
 /*
  * Attach TCP protocol to socket, allocating
@@ -1521,7 +1524,7 @@ tcp_attach(struct socket *so)
int error;
 
if (so->so_snd.sb_hiwat == 0 || so->so_rcv.sb_hiwat == 0) {
-   error = soreserve(so, tcp_sendspace, tcp_recvspace);
+   error = soreserve(so, V_tcp_sendspace, V_tcp_recvspace);
if (error)
return (error);
}

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Sun Oct 16 14:30:28 2011(r226436)
+++ head/sys/netinet/tcp_var.h  Sun Oct 16 15:08:43 2011(r226437)
@@ -606,6 +606,8 @@ VNET_DECLARE(int, tcp_mssdflt); /* XXX *
 VNET_DECLARE(int, tcp_minmss);
 VNET_DECLARE(int, tcp_delack_enabled);
 VNET_DECLARE(int, tcp_do_rfc3390);
+VNET_DECLARE(int, tcp_sendspace);
+VNET_DECLARE(int, tcp_recvspace);
 VNET_DECLARE(int, path_mtu_discovery);
 VNET_DECLARE(int, ss_fltsz);
 VNET_DECLARE(int, ss_fltsz_local);
@@ -618,6 +620,8 @@ VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcp_minmssVNET(tcp_minmss)
 #defineV_tcp_delack_enabledVNET(tcp_delack_enabled)
 #defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390)
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+#defineV_tcp_recvspace VNET(tcp_recvspace)
 #defineV_path_mtu_discoveryVNET(path_mtu_discovery)
 #defineV_ss_fltsz  VNET(ss_fltsz)
 #defineV_ss_fltsz_localVNET(ss_fltsz_local)
@@ -716,8 +720,6 @@ void tcp_hc_updatemtu(struct in_conninf
 voidtcp_hc_update(struct in_conninfo *, struct hc_metrics_lite *);
 
 extern struct pr_usrreqs tcp_usrreqs;
-extern u_long tcp_sendspace;
-extern u_long tcp_recvspace;
 tcp_seq tcp_new_isn(struct tcpcb *);
 
 voidtcp_sack_doack(struct tcpcb *, struct tcpopt *, tcp_seq);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r226447 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 20:06:44 2011
New Revision: 226447
URL: http://svn.freebsd.org/changeset/base/226447

Log:
  Remove the ss_fltsz and ss_fltsz_local sysctl's which have
  long been superseded by the RFC3390 initial CWND sizing.
  
  Also remove the remnants of TCP_METRICS_CWND which used the
  TCP hostcache to set the initial CWND in a non-RFC compliant
  way.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 19:46:52 2011
(r226446)
+++ head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011
(r226447)
@@ -301,9 +301,6 @@ cc_conn_init(struct tcpcb *tp)
struct hc_metrics_lite metrics;
struct inpcb *inp = tp->t_inpcb;
int rtt;
-#ifdef INET6
-   int isipv6 = ((inp->inp_vflag & INP_IPV6) != 0) ? 1 : 0;
-#endif
 
INP_WLOCK_ASSERT(tp->t_inpcb);
 
@@ -337,49 +334,16 @@ cc_conn_init(struct tcpcb *tp)
}
 
/*
-* Set the slow-start flight size depending on whether this
-* is a local network or not.
-*
-* Extend this so we cache the cwnd too and retrieve it here.
-* Make cwnd even bigger than RFC3390 suggests but only if we
-* have previous experience with the remote host. Be careful
-* not make cwnd bigger than remote receive window or our own
-* send socket buffer. Maybe put some additional upper bound
-* on the retrieved cwnd. Should do incremental updates to
-* hostcache when cwnd collapses so next connection doesn't
-* overloads the path again.
-*
-* XXXAO: Initializing the CWND from the hostcache is broken
-* and in its current form not RFC conformant.  It is disabled
-* until fixed or removed entirely.
+* Set the initial slow-start flight size.
 *
 * RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-* We currently check only in syncache_socket for that.
+* XXX: We currently check only in syncache_socket for that.
 */
-/* #define TCP_METRICS_CWND */
-#ifdef TCP_METRICS_CWND
-   if (metrics.rmx_cwnd)
-   tp->snd_cwnd = max(tp->t_maxseg, min(metrics.rmx_cwnd / 2,
-   min(tp->snd_wnd, so->so_snd.sb_hiwat)));
-   else
-#endif
if (V_tcp_do_rfc3390)
tp->snd_cwnd = min(4 * tp->t_maxseg,
max(2 * tp->t_maxseg, 4380));
-#ifdef INET6
-   else if (isipv6 && in6_localaddr(&inp->in6p_faddr))
-   tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local;
-#endif
-#if defined(INET) && defined(INET6)
-   else if (!isipv6 && in_localaddr(inp->inp_faddr))
-   tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local;
-#endif
-#ifdef INET
-   else if (in_localaddr(inp->inp_faddr))
-   tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz_local;
-#endif
else
-   tp->snd_cwnd = tp->t_maxseg * V_ss_fltsz;
+   tp->snd_cwnd = tp->t_maxseg;
 
if (CC_ALGO(tp)->conn_init != NULL)
CC_ALGO(tp)->conn_init(tp->ccv);

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 16 19:46:52 2011
(r226446)
+++ head/sys/netinet/tcp_output.c   Sun Oct 16 20:06:44 2011
(r226447)
@@ -89,16 +89,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
&VNET_NAME(path_mtu_discovery), 1,
"Enable Path MTU Discovery");
 
-VNET_DEFINE(int, ss_fltsz) = 1;
-SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, slowstart_flightsize, CTLFLAG_RW,
-   &VNET_NAME(ss_fltsz), 1,
-   "Slow start flight size");
-
-VNET_DEFINE(int, ss_fltsz_local) = 4;
-SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, local_slowstart_flightsize,
-   CTLFLAG_RW, &VNET_NAME(ss_fltsz_local), 1,
-   "Slow start flight size for local networks");
-
 VNET_DEFINE(int, tcp_do_tso) = 1;
 #defineV_tcp_do_tsoVNET(tcp_do_tso)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, tso, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Sun Oct 16 19:46:52 2011(r226446)
+++ head/sys/netinet/tcp_var.h  Sun Oct 16 20:06:44 2011(r226447)
@@ -609,8 +609,6 @@ VNET_DECLARE(int, tcp_do_rfc3390);
 VNET_DECLARE(int, tcp_sendspace);
 VNET_DECLARE(int, tcp_recvspace);
 VNET_DECLARE(int, path_mtu_discovery);
-VNET_DECLARE(int, ss_fltsz);
-VNET_DECLARE(int, ss_fltsz_local);
 VNET_DECLARE(int, tcp_do_rfc3465);
 VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcb   VNET(tcb)
@@ -623,8 +621,6 @@ VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcp_sendspace VNET(tcp_sendspace)
 #d

svn commit: r226448 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 20:18:39 2011
New Revision: 226448
URL: http://svn.freebsd.org/changeset/base/226448

Log:
  Move the tcp_sendspace and tcp_recvspace sysctl's from
  the middle of tcp_usrreq.c to the top of tcp_output.c
  and tcp_input.c respectively next to the socket buffer
  autosizing controls.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_usrreq.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_input.cSun Oct 16 20:18:39 2011
(r226448)
@@ -183,6 +183,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 &VNET_NAME(tcp_insecure_rst), 0,
 "Follow the old (insecure) criteria for accepting RST packets");
 
+VNET_DEFINE(int, tcp_recvspace) = 1024*64
+#defineV_tcp_recvspace VNET(tcp_recvspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
+&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size");
+
 VNET_DEFINE(int, tcp_do_autorcvbuf) = 1;
 #defineV_tcp_do_autorcvbuf VNET(tcp_do_autorcvbuf)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, recvbuf_auto, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_output.c   Sun Oct 16 20:18:39 2011
(r226448)
@@ -95,6 +95,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
&VNET_NAME(tcp_do_tso), 0,
"Enable TCP Segmentation Offload");
 
+VNET_DEFINE(int, tcp_sendspace) = 1024*32;
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
+   &VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size");
+
 VNET_DEFINE(int, tcp_do_autosndbuf) = 1;
 #defineV_tcp_do_autosndbuf VNET(tcp_do_autosndbuf)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, sendbuf_auto, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 20:18:39 2011
(r226448)
@@ -1498,20 +1498,6 @@ tcp_ctloutput(struct socket *so, struct 
 #undef INP_WLOCK_RECHECK
 
 /*
- * Set the initial send and receive socket buffer sizes for
- * newly created TCP sockets.
- */
-VNET_DEFINE(int, tcp_sendspace) = 1024*32;
-#defineV_tcp_sendspace VNET(tcp_sendspace)
-SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
-&VNET_NAME(tcp_sendspace), 0, "Initial send socket buffer size");
-
-VNET_DEFINE(int, tcp_recvspace) = 1024*64
-#defineV_tcp_recvspace VNET(tcp_recvspace)
-SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
-&VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size");
-
-/*
  * Attach TCP protocol to socket, allocating
  * internet protocol control block, tcp control block,
  * bufer space, and entering LISTEN state if to accept connections.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r226454 - head/sys/netinet

2011-10-16 Thread Andre Oppermann

On 17.10.2011 02:16, Bjoern A. Zeeb wrote:


On 17. Oct 2011, at 00:05 , Bjoern A. Zeeb wrote:


Author: bz
Date: Mon Oct 17 00:05:31 2011
New Revision: 226454
URL: http://svn.freebsd.org/changeset/base/226454

Log:
  Add syntactic sugar missed in r226437 and then not added either when moving
  things around in r226448 but desperately needed to always make things
  compile successfully.




GENRIC and LINT did not fail failed on it as it expanded to:

int tcp_recvspace = 1024*64

followed by:

#define SYSCTL_VNET_INT(parent, nbr, name, access, ptr, val, descr) \
 SYSCTL_INT(parent, nbr, name, access, ptr, val, descr)

=>

#define SYSCTL_INT(parent, nbr, name, access, ptr, val, descr)  \
 SYSCTL_ASSERT_TYPE(INT, ptr, parent, name); \
 SYSCTL_OID(parent, nbr, name,   \
 CTLTYPE_INT | CTLFLAG_MPSAFE | (access),\
 ptr, val, sysctl_handle_int, "I", descr)

and the SYSCTL_ASSERT_TYPE() expanding to nothing in

#define SYSCTL_ASSERT_TYPE(type, ptr, parent, name)

leaving just the ';' around;  so it ended up as:

int tcp_recvspace = 1024*64

;
and an expanded SYSCTL_OID(...);


Oops, sorry missing that one. And thanks for comitting the fix.

--
Andre


  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 22:24:04 2011
(r226453)
+++ head/sys/netinet/tcp_input.cMon Oct 17 00:05:31 2011
(r226454)
@@ -183,7 +183,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 &VNET_NAME(tcp_insecure_rst), 0,
 "Follow the old (insecure) criteria for accepting RST packets");

-VNET_DEFINE(int, tcp_recvspace) = 1024*64
+VNET_DEFINE(int, tcp_recvspace) = 1024*64;
#define V_tcp_recvspace VNET(tcp_recvspace)
SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
 &VNET_NAME(tcp_recvspace), 0, "Initial receive socket buffer size");




___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r227499 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:10:42 2011
New Revision: 227499
URL: http://svn.freebsd.org/changeset/base/227499

Log:
  Note the ip_len bug fixed in r226105 in the BUGS section.

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498)
+++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499)
@@ -32,7 +32,7 @@
 .\" @(#)ip.4   8.2 (Berkeley) 11/30/93
 .\" $FreeBSD$
 .\"
-.Dd June 1, 2009
+.Dd November 14, 2011
 .Dt IP 4
 .Os
 .Sh NAME
@@ -847,3 +847,9 @@ The
 .Vt ip_mreqn
 structure appeared in
 .Tn Linux 2.4 .
+.Sh BUGS
+Before
+.Fx 10.0 packets received on raw IP sockets had the
+.Va ip_hl
+subtracted from the
+.Va ip_len field.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r227500 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:14:42 2011
New Revision: 227500
URL: http://svn.freebsd.org/changeset/base/227500

Log:
  Remove mention of ss_fltsz and ss_fltsz_local which were retired in r226447.

Modified:
  head/share/man/man4/tcp.4

Modified: head/share/man/man4/tcp.4
==
--- head/share/man/man4/tcp.4   Mon Nov 14 15:10:42 2011(r227499)
+++ head/share/man/man4/tcp.4   Mon Nov 14 15:14:42 2011(r227500)
@@ -38,7 +38,7 @@
 .\" From: @(#)tcp.48.1 (Berkeley) 6/5/93
 .\" $FreeBSD$
 .\"
-.Dd September 15, 2011
+.Dd November 14, 2011
 .Dt TCP 4
 .Os
 .Sh NAME
@@ -290,14 +290,6 @@ That of 2 results in any
 packets to closed ports being logged.
 Any value unlisted above disables the logging
 (default is 0, i.e., the logging is disabled).
-.It Va slowstart_flightsize
-The number of packets allowed to be in-flight during the
-.Tn TCP
-slow-start phase on a non-local network.
-.It Va local_slowstart_flightsize
-The number of packets allowed to be in-flight during the
-.Tn TCP
-slow-start phase to local machines in the same subnet.
 .It Va msl
 The Maximum Segment Lifetime, in milliseconds, for a packet.
 .It Va keepinit
@@ -411,15 +403,6 @@ maximum segment size.
 This helps throughput in general, but
 particularly affects short transfers and high-bandwidth large
 propagation-delay connections.
-.Pp
-When this feature is enabled, the
-.Va slowstart_flightsize
-and
-.Va local_slowstart_flightsize
-settings are not observed for new
-connection slow starts, but they are still used for slow starts
-that occur when the connection has been idle and starts sending
-again.
 .It Va sack.enable
 Enable support for RFC 2018, TCP Selective Acknowledgment option,
 which allows the receiver to inform the sender about all successfully
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r227499 - head/share/man/man4

2011-11-14 Thread Andre Oppermann

On 14.11.2011 16:38, Garrett Cooper wrote:

On Mon, Nov 14, 2011 at 7:10 AM, Andre Oppermann  wrote:

Author: andre
Date: Mon Nov 14 15:10:42 2011
New Revision: 227499
URL: http://svn.freebsd.org/changeset/base/227499

Log:
  Note the ip_len bug fixed in r226105 in the BUGS section.

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498)
+++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499)
@@ -32,7 +32,7 @@
  .\" @(#)ip.4   8.2 (Berkeley) 11/30/93
  .\" $FreeBSD$
  .\"
-.Dd June 1, 2009
+.Dd November 14, 2011
  .Dt IP 4
  .Os
  .Sh NAME
@@ -847,3 +847,9 @@ The
  .Vt ip_mreqn
  structure appeared in
  .Tn Linux 2.4 .
+.Sh BUGS
+Before
+.Fx 10.0 packets received on raw IP sockets had the
+.Va ip_hl
+subtracted from the
+.Va ip_len field.


Isn't the fix going to be MFCed?


It was. However there are some ports depending on this bug and due
to the late stage we are in the release cycle we decided to back out
the MFC.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r227501 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:57:03 2011
New Revision: 227501
URL: http://svn.freebsd.org/changeset/base/227501

Log:
  mdoc fix for r227499.
  
  Reported by:  brueffer

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:14:42 2011(r227500)
+++ head/share/man/man4/ip.4Mon Nov 14 15:57:03 2011(r227501)
@@ -849,7 +849,8 @@ structure appeared in
 .Tn Linux 2.4 .
 .Sh BUGS
 Before
-.Fx 10.0 packets received on raw IP sockets had the
+.Fx 10.0
+packets received on raw IP sockets had the
 .Va ip_hl
 subtracted from the
 .Va ip_len field.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r249843 - head/sys/kern

2013-04-24 Thread Andre Oppermann
Author: andre
Date: Wed Apr 24 13:54:55 2013
New Revision: 249843
URL: http://svnweb.freebsd.org/changeset/base/249843

Log:
  Base the calculation of maxmbufmem in part on kmem_map size
  instead of kernel_map size to prevent kernel memory exhaustion
  by mbufs and a subsequent panic on physical page allocation
  failure.
  
  On architectures without a direct map all mbuf memory (except
  for jumbo mbufs larger than PAGE_SIZE) comes from kmem_map.
  It is the limiting factor hence.
  
  For architectures with a direct map using the size of kmem_map
  is a good proxy of available kernel memory as well.  If it is
  much smaller the mbuf limit may be sub-optimal but remains
  reasonable, while avoiding panics under exhaustion.
  
  The overall mbuf memory limit calculation may be reconsidered
  again later, however due to the many different mbuf sizes and
  different backing KVM maps it is a tricky subject.
  
  Found by: pho's new network stress test
  Pointed out by:   alc (kmem_map instead of kernel_map)
  Tested by:pho

Modified:
  head/sys/kern/kern_mbuf.c

Modified: head/sys/kern/kern_mbuf.c
==
--- head/sys/kern/kern_mbuf.c   Wed Apr 24 13:19:48 2013(r249842)
+++ head/sys/kern/kern_mbuf.c   Wed Apr 24 13:54:55 2013(r249843)
@@ -118,7 +118,7 @@ tunable_mbinit(void *dummy)
 * At most it can be 3/4 of available kernel memory.
 */
realmem = qmin((quad_t)physmem * PAGE_SIZE,
-   vm_map_max(kernel_map) - vm_map_min(kernel_map));
+   vm_map_max(kmem_map) - vm_map_min(kmem_map));
maxmbufmem = realmem / 2;
TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem);
if (maxmbufmem > realmem / 4 * 3)
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r250300 - in head/sys: kern net netinet sys

2013-05-06 Thread Andre Oppermann
Author: andre
Date: Mon May  6 16:42:18 2013
New Revision: 250300
URL: http://svnweb.freebsd.org/changeset/base/250300

Log:
  Back out r249318, r249320 and r249327 due to a heisenbug most
  likely related to a race condition in the ipi_hash_lock with
  the exact cause currently unknown but under investigation.

Modified:
  head/sys/kern/uipc_socket.c
  head/sys/net/if.c
  head/sys/net/if_llatbl.c
  head/sys/net/if_llatbl.h
  head/sys/net/if_var.h
  head/sys/netinet/in_pcb.h
  head/sys/netinet/in_var.h
  head/sys/netinet/ip_id.c
  head/sys/netinet/ip_input.c
  head/sys/netinet/tcp_subr.c
  head/sys/sys/socketvar.h

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Mon May  6 16:11:53 2013(r250299)
+++ head/sys/kern/uipc_socket.c Mon May  6 16:42:18 2013(r250300)
@@ -240,14 +240,14 @@ SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO
  * accept_mtx locks down per-socket fields relating to accept queues.  See
  * socketvar.h for an annotation of the protected fields of struct socket.
  */
-struct mtx_padalign accept_mtx;
+struct mtx accept_mtx;
 MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF);
 
 /*
  * so_global_mtx protects so_gencnt, numopensockets, and the per-socket
  * so_gencnt field.
  */
-static struct mtx_padalign so_global_mtx;
+static struct mtx so_global_mtx;
 MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF);
 
 /*

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Mon May  6 16:11:53 2013(r250299)
+++ head/sys/net/if.c   Mon May  6 16:42:18 2013(r250300)
@@ -206,7 +206,7 @@ VNET_DEFINE(struct ifindex_entry *, ifin
  * also to stablize it over long-running ioctls, without introducing priority
  * inversions and deadlocks.
  */
-struct rwlock_padalign ifnet_rwlock;
+struct rwlock ifnet_rwlock;
 struct sx ifnet_sxlock;
 
 /*

Modified: head/sys/net/if_llatbl.c
==
--- head/sys/net/if_llatbl.cMon May  6 16:11:53 2013(r250299)
+++ head/sys/net/if_llatbl.cMon May  6 16:42:18 2013(r250300)
@@ -67,7 +67,7 @@ static VNET_DEFINE(SLIST_HEAD(, lltable)
 
 static void vnet_lltable_init(void);
 
-struct rwlock_padalign lltable_rwlock;
+struct rwlock lltable_rwlock;
 RW_SYSINIT(lltable_rwlock, &lltable_rwlock, "lltable_rwlock");
 
 /*

Modified: head/sys/net/if_llatbl.h
==
--- head/sys/net/if_llatbl.hMon May  6 16:11:53 2013(r250299)
+++ head/sys/net/if_llatbl.hMon May  6 16:42:18 2013(r250300)
@@ -43,7 +43,7 @@ struct rt_addrinfo;
 struct llentry;
 LIST_HEAD(llentries, llentry);
 
-extern struct rwlock_padalign lltable_rwlock;
+extern struct rwlock lltable_rwlock;
 #defineLLTABLE_RLOCK() rw_rlock(&lltable_rwlock)
 #defineLLTABLE_RUNLOCK()   rw_runlock(&lltable_rwlock)
 #defineLLTABLE_WLOCK() rw_wlock(&lltable_rwlock)

Modified: head/sys/net/if_var.h
==
--- head/sys/net/if_var.h   Mon May  6 16:11:53 2013(r250299)
+++ head/sys/net/if_var.h   Mon May  6 16:42:18 2013(r250300)
@@ -191,9 +191,9 @@ struct ifnet {
void*if_unused[2];
void*if_afdata[AF_MAX];
int if_afdata_initialized;
+   struct  rwlock if_afdata_lock;
struct  task if_linktask;   /* task for link change events */
-   struct  rwlock_padalign if_afdata_lock;
-   struct  rwlock_padalign if_addr_lock;   /* lock to protect address 
lists */
+   struct  rwlock if_addr_lock;/* lock to protect address lists */
 
LIST_ENTRY(ifnet) if_clones;/* interfaces of a cloner */
TAILQ_HEAD(, ifg_list) if_groups; /* linked list of groups per if */
@@ -832,7 +832,7 @@ struct ifmultiaddr {
 
 #ifdef _KERNEL
 
-extern struct rwlock_padalign ifnet_rwlock;
+extern struct rwlock ifnet_rwlock;
 extern struct sx ifnet_sxlock;
 
 #defineIFNET_LOCK_INIT() do {  
\

Modified: head/sys/netinet/in_pcb.h
==
--- head/sys/netinet/in_pcb.h   Mon May  6 16:11:53 2013(r250299)
+++ head/sys/netinet/in_pcb.h   Mon May  6 16:42:18 2013(r250300)
@@ -330,7 +330,7 @@ struct inpcbinfo {
/*
 * Global lock protecting non-pcbgroup hash lookup tables.
 */
-   struct rwlock_padalign   ipi_hash_lock;
+   struct rwlockipi_hash_lock;
 
/*
 * Global hash of inpcbs, hashed by local and foreign addresses and

Modified: head/sys/netinet/in_var.h
==
--- head/sys/netinet/in_var.h   Mon May  6

svn commit: r250365 - head/sys/kern

2013-05-08 Thread Andre Oppermann
Author: andre
Date: Wed May  8 14:13:14 2013
New Revision: 250365
URL: http://svnweb.freebsd.org/changeset/base/250365

Log:
  When the accept queue is full print the number of already pending
  new connections instead of by how many we're over the limit, which
  is always 1.
  
  Noticed by:   jmallet
  MFC after:1 week

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Wed May  8 13:26:17 2013(r250364)
+++ head/sys/kern/uipc_socket.c Wed May  8 14:13:14 2013(r250365)
@@ -515,7 +515,7 @@ sonewconn(struct socket *head, int conns
 #endif
log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: "
"%i already in queue awaiting acceptance\n",
-   __func__, head->so_pcb, over);
+   __func__, head->so_pcb, head->so_qlen);
return (NULL);
}
VNET_ASSERT(head->so_vnet != NULL, ("%s:%d so_vnet is NULL, head=%p",
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r250658 - in head: share/mk sys/conf tools/build/options

2013-05-17 Thread Andre Oppermann

On 15.05.2013 15:04, Brooks Davis wrote:

Author: brooks
Date: Wed May 15 13:04:10 2013
New Revision: 250658
URL: http://svnweb.freebsd.org/changeset/base/250658

Log:
   Add a new option WITHOUT_FORMAT_EXTENSIONS to disable flags related to
   checking our kernel printf extensions.  This is useful to allow
   compilers without these extensions to build kernels.

   Sponsored by:DARPA, AFRL


This breaks "make depend" at least on amd64:

"../../../conf/kern.mk", line 37: Malformed conditional (${MK_FORMAT_EXTENSIONS} == 
"no")
"../../../conf/kern.mk", line 39: if-less else
"../../../conf/kern.mk", line 41: if-less endif
make: fatal errors encountered -- cannot continue

--
Andre


Added:
   head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS   (contents, props 
changed)
Modified:
   head/share/mk/bsd.own.mk
   head/sys/conf/kern.mk

Modified: head/share/mk/bsd.own.mk
==
--- head/share/mk/bsd.own.mkWed May 15 08:38:49 2013(r250657)
+++ head/share/mk/bsd.own.mkWed May 15 13:04:10 2013(r250658)
@@ -268,6 +268,7 @@ __DEFAULT_YES_OPTIONS = \
  ED_CRYPTO \
  EXAMPLES \
  FLOPPY \
+FORMAT_EXTENSIONS \
  FORTH \
  FP_LIBC \
  FREEBSD_UPDATE \

Modified: head/sys/conf/kern.mk
==
--- head/sys/conf/kern.mk   Wed May 15 08:38:49 2013(r250657)
+++ head/sys/conf/kern.mk   Wed May 15 13:04:10 2013(r250658)
@@ -5,7 +5,7 @@
  #
  CWARNFLAGS?=  -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes \
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual \
-   -Wundef -Wno-pointer-sign -fformat-extensions \
+   -Wundef -Wno-pointer-sign ${FORMAT_EXTENTIONS} \
-Wmissing-include-dirs -fdiagnostics-show-option \
${CWARNEXTRA}
  #
@@ -29,7 +29,15 @@ NO_WSOMETIMES_UNINITIALIZED= -Wno-error-
  # enough to error out the whole kernel build.  Display them anyway, so there 
is
  # some incentive to fix them eventually.
  CWARNEXTRA?=  -Wno-error-tautological-compare -Wno-error-empty-body \
-   -Wno-error-parentheses-equality
+   -Wno-error-parentheses-equality ${NO_WFORMAT}
+.endif
+
+# External compilers may not support our format extensions.  Allow them
+# to be disabled.  WARNING: format checking is disabled in this case.
+.if ${MK_FORMAT_EXTENSIONS} == "no"
+NO_WFORMAT=-Wno-format
+.else
+FORMAT_EXTENTIONS= -fformat-extensions
  .endif

  #

Added: head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS
==
--- /dev/null   00:00:00 1970   (empty, because file is newly added)
+++ head/tools/build/options/WITHOUT_FORMAT_EXTENSIONS  Wed May 15 13:04:10 
2013(r250658)
@@ -0,0 +1,5 @@
+.\" $FreeBSD$
+Set to not enable
+.Fl fformat-extensions
+when compiling the kernel.
+Also disables all format checking.




___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r251296 - in head/sys: net netinet

2013-06-03 Thread Andre Oppermann
Author: andre
Date: Mon Jun  3 12:55:13 2013
New Revision: 251296
URL: http://svnweb.freebsd.org/changeset/base/251296

Log:
  Allow drivers to specify a maximum TSO length in bytes if they are
  limited in the amount of data they can handle at once.
  
  Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
  change the limit.
  
  The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
  less wouldn't be very useful anymore.  The upper limit is still at
  IP_MAXPACKET (65536 bytes).  Raising it requires further auditing of
  the IPv4/v6 code path's as the length field in the IP header would
  overflow leading to confusion in firewalls and others packet handler on
  the real size of the packet.
  
  The placement into "struct ifnet" is a bit hackish but the best place
  that was found.  When the stack/driver boundary is updated it should
  be handled in a better way.
  
  Submitted by: cperciva (earlier version)
  Reviewed by:  cperciva
  Tested by:cperciva
  MFC after:1 week (using spare struct members to preserve ABI)

Modified:
  head/sys/net/if.c
  head/sys/net/if_var.h
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Mon Jun  3 12:43:09 2013(r251295)
+++ head/sys/net/if.c   Mon Jun  3 12:55:13 2013(r251296)
@@ -74,18 +74,18 @@
 #include 
 
 #if defined(INET) || defined(INET6)
-/*XXX*/
 #include 
 #include 
+#include 
 #include 
+#ifdef INET
+#include 
+#endif /* INET */
 #ifdef INET6
 #include 
 #include 
-#endif
-#endif
-#ifdef INET
-#include 
-#endif
+#endif /* INET6 */
+#endif /* INET || INET6 */
 
 #include 
 
@@ -653,6 +653,13 @@ if_attach_internal(struct ifnet *ifp, in
TAILQ_INSERT_HEAD(&ifp->if_addrhead, ifa, ifa_link);
/* Reliably crash if used uninitialized. */
ifp->if_broadcastaddr = NULL;
+
+   /* Initialize to max value. */
+   if (ifp->if_hw_tsomax == 0)
+   ifp->if_hw_tsomax = IP_MAXPACKET;
+   KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET &&
+   ifp->if_hw_tsomax >= IP_MAXPACKET / 8,
+   ("%s: tsomax outside of range", __func__));
}
 #ifdef VIMAGE
else {

Modified: head/sys/net/if_var.h
==
--- head/sys/net/if_var.h   Mon Jun  3 12:43:09 2013(r251295)
+++ head/sys/net/if_var.h   Mon Jun  3 12:55:13 2013(r251296)
@@ -204,6 +204,11 @@ struct ifnet {
u_int   if_fib; /* interface FIB */
u_char  if_alloctype;   /* if_type at time of allocation */
 
+   u_int   if_hw_tsomax;   /* tso burst length limit, the minmum
+* is (IP_MAXPACKET / 8).
+* XXXAO: Have to find a better place
+* for it eventually. */
+
/*
 * Spare fields are added so that we can modify sensitive data
 * structures without changing the kernel binary interface, and must

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cMon Jun  3 12:43:09 2013
(r251295)
+++ head/sys/netinet/tcp_input.cMon Jun  3 12:55:13 2013
(r251296)
@@ -3434,7 +3434,7 @@ tcp_xmit_timer(struct tcpcb *tp, int rtt
  */
 void
 tcp_mss_update(struct tcpcb *tp, int offer, int mtuoffer,
-struct hc_metrics_lite *metricptr, int *mtuflags)
+struct hc_metrics_lite *metricptr, struct tcp_ifcap *cap)
 {
int mss = 0;
u_long maxmtu = 0;
@@ -3461,7 +3461,7 @@ tcp_mss_update(struct tcpcb *tp, int off
/* Initialize. */
 #ifdef INET6
if (isipv6) {
-   maxmtu = tcp_maxmtu6(&inp->inp_inc, mtuflags);
+   maxmtu = tcp_maxmtu6(&inp->inp_inc, cap);
tp->t_maxopd = tp->t_maxseg = V_tcp_v6mssdflt;
}
 #endif
@@ -3470,7 +3470,7 @@ tcp_mss_update(struct tcpcb *tp, int off
 #endif
 #ifdef INET
{
-   maxmtu = tcp_maxmtu(&inp->inp_inc, mtuflags);
+   maxmtu = tcp_maxmtu(&inp->inp_inc, cap);
tp->t_maxopd = tp->t_maxseg = V_tcp_mssdflt;
}
 #endif
@@ -3605,11 +3605,12 @@ tcp_mss(struct tcpcb *tp, int offer)
struct inpcb *inp;
struct socket *so;
struct hc_metrics_lite metrics;
-   int mtuflags = 0;
+   struct tcp_ifcap cap;
 
KASSERT(tp != NULL, ("%s: tp == NULL", __func__));
-   
-   tcp_mss_update(tp, offer, -1, &metrics, &mtuflags);
+
+   bzero(&cap, sizeof(cap));
+   tcp_mss_update(tp, offer, -1, &metrics, &cap);
 
mss = tp->t_maxseg;
inp = tp->t_inpcb;
@@ -3

svn commit: r251297 - head/sys/dev/xen/netfront

2013-06-03 Thread Andre Oppermann
Author: andre
Date: Mon Jun  3 13:00:33 2013
New Revision: 251297
URL: http://svnweb.freebsd.org/changeset/base/251297

Log:
  Specify a maximum TSO length limiting the segment chain to what the
  Xen host side can handle after defragmentation.
  
  This prevents the driver from throwing away too long TSO chains and
  improves the performance on Amazon AWS instances with 10GigE virtual
  interfaces to the normally expected throughput.
  
  Submitted by: cperciva (earlier version)
  Reviewed by:  cperciva
  Tested by:cperciva
  MFC after:1 week

Modified:
  head/sys/dev/xen/netfront/netfront.c

Modified: head/sys/dev/xen/netfront/netfront.c
==
--- head/sys/dev/xen/netfront/netfront.cMon Jun  3 12:55:13 2013
(r251296)
+++ head/sys/dev/xen/netfront/netfront.cMon Jun  3 13:00:33 2013
(r251297)
@@ -134,6 +134,7 @@ static const int MODPARM_rx_flip = 0;
  * to mirror the Linux MAX_SKB_FRAGS constant.
  */
 #defineMAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2)
+#defineNF_TSO_MAXBURST ((IP_MAXPACKET / PAGE_SIZE) * MCLBYTES)
 
 #define RX_COPY_THRESHOLD 256
 
@@ -2122,6 +2123,7 @@ create_netdev(device_t dev)

ifp->if_hwassist = XN_CSUM_FEATURES;
ifp->if_capabilities = IFCAP_HWCSUM;
+   ifp->if_hw_tsomax = NF_TSO_MAXBURST;

ether_ifattach(ifp, np->mac);
callout_init(&np->xn_stat_ch, CALLOUT_MPSAFE);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r251297 - head/sys/dev/xen/netfront

2013-06-04 Thread Andre Oppermann

On 05.06.2013 08:13, Colin Percival wrote:

On 06/04/13 22:51, Lawrence Stewart wrote:

On 06/03/13 23:00, Andre Oppermann wrote:

Modified: head/sys/dev/xen/netfront/netfront.c
==
--- head/sys/dev/xen/netfront/netfront.cMon Jun  3 12:55:13 2013
(r251296)
+++ head/sys/dev/xen/netfront/netfront.cMon Jun  3 13:00:33 2013
(r251297)
@@ -134,6 +134,7 @@ static const int MODPARM_rx_flip = 0;
   * to mirror the Linux MAX_SKB_FRAGS constant.
   */
  #define   MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2)
+#defineNF_TSO_MAXBURST ((IP_MAXPACKET / PAGE_SIZE) * MCLBYTES)


For posterity's sake, can you and/or Colin please elaborate on how this
value was determined and what it is dependent upon? Could a newer
version of Xen remove the need for this reduced limit?


The comment above (of which only the last line is quoted in the diff)
explains it:
  * This limit is imposed by the backend driver.  We assume here that
  * we are dealing with a Linux driver domain and have set our limit
  * to mirror the Linux MAX_SKB_FRAGS constant.

This isn't a Xen issue really; rather, it's a Linux Dom0 issue.  AFAIK
there are no changes in the pipe to fix this in Linux; but this would not
be needed with a different Dom0 (e.g., a FreeBSD Dom0, if/when that becomes
possible) or if FreeBSD switched to using 4kB mbuf clusters (since at that
point we would be matching Linux and be able to fit a maximum-length IP
packet into the allowed number of fragments).


We do support 4K mbufs and have done so for a long time.  The problem is
that socket buffer mbuf chains can be any combination of mbuf sizes and
m_defrag() so far only collapses to 2K mbuf clusters.  The latter can be
changed but it is used in a number of places where an explicit 2K assumption
may have been made (even if it shouldn't).  When all them are checked
m_defrag() can be changed to collapse into 4K mbufs and this "hack" removed.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r251894 - in head: lib/libmemstat sys/vm

2013-06-18 Thread Andre Oppermann

On 18.06.2013 06:50, Jeff Roberson wrote:

Author: jeff
Date: Tue Jun 18 04:50:20 2013
New Revision: 251894
URL: http://svnweb.freebsd.org/changeset/base/251894

Log:
   Refine UMA bucket allocation to reduce space consumption and improve
   performance.

- Always free to the alloc bucket if there is space.  This gives LIFO
  allocation order to improve hot-cache performance.  This also allows
  for zones with a single bucket per-cpu rather than a pair if the entire
  working set fits in one bucket.
- Enable per-cpu caches of buckets.  To prevent recursive bucket
  allocation one bucket zone still has per-cpu caches disabled.
- Pick the initial bucket size based on a table driven maximum size
  per-bucket rather than the number of items per-page.  This gives
  more sane initial sizes.
- Only grow the bucket size when we face contention on the zone lock, this
  causes bucket sizes to grow more slowly.
- Adjust the number of items per-bucket to account for the header space.
  This packs the buckets more efficiently per-page while making them
  not quite powers of two.
- Eliminate the per-zone free bucket list.  Always return buckets back
  to the bucket zone.  This ensures that as zones grow into larger
  bucket sizes they eventually discard the smaller sizes.  It persists
  fewer buckets in the system.  The locking is slightly trickier.
- Only switch buckets in zalloc, not zfree, this eliminates pathological
  cases where we ping-pong between two buckets.
- Ensure that the thread that fills a new bucket gets to allocate from
  it to give a better upper bound on allocation time.


There used to be a problem with per CPU caches accumulating large amounts
of items without freeing back to the global (or socket) pool.

Do these updates to UMA change this situation and/or do you have further
improvements coming up?

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r251886 - in head: contrib/apr contrib/apr-util contrib/serf contrib/sqlite3 contrib/subversion share/mk usr.bin usr.bin/svn usr.bin/svn/lib usr.bin/svn/lib/libapr usr.bin/svn/lib/liba

2013-06-18 Thread Andre Oppermann

On 18.06.2013 18:40, Tijl Coosemans wrote:

On 2013-06-18 04:53, Peter Wemm wrote:

Author: peter
Date: Tue Jun 18 02:53:45 2013
New Revision: 251886
URL: http://svnweb.freebsd.org/changeset/base/251886

Log:
   Introduce svnlite so that we can check out our source code again.

   This is actually a fully functional build except:
   * All internal shared libraries are static linked to make sure there
 is no interference with ports (and to reduce build time).
   * It does not have the python/perl/etc plugin or API support.
   * By default, it installs as "svnlite" rather than "svn".
   * If WITH_SVN added in make.conf, you get "svn".
   * If WITHOUT_SVNLITE is in make.conf, this is completely disabled.

   To be absolutely clear, this is not intended for any use other than
   checking out freebsd source and committing, like we once did with cvs.

   It should be usable for small scale local repositories that don't
   need the python/perl plugin architecture.


This ties the repo to the oldest supported release, meaning that years
from now we won't be able to use some new subversion feature because
an old FreeBSD release doesn't support it.


AFAIK there is a checkout-only SVN client available, as in cvsup, but I don't
remember the name.


I don't find it unreasonable to ask developers to install the port.
And for users it seems all they need is something like portsnap for base.
Portsnap already distributes ports svn so it shouldn't be too hard to
adapt it for base. And the extra layer it adds is very convenient. Apart
from a bigger than usual update maybe, portsnap users never even noticed
it was switched from cvs to svn at some point.


Installing SVN from ports is very painful because of the huge dependency
chain it carries, with the largest being Python and Perl IIRC.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r251886 - in head: contrib/apr contrib/apr-util contrib/serf contrib/sqlite3 contrib/subversion share/mk usr.bin usr.bin/svn usr.bin/svn/lib usr.bin/svn/lib/libapr usr.bin/svn/lib/liba

2013-06-18 Thread Andre Oppermann

On 18.06.2013 19:04, Alexey Dokuchaev wrote:


Being able to checkout the sources is very desirable, but not at the
cost of importing another heavy 3rd-party tool, which Subversion is.


Just wanted to note that applaud Peter for actually doing something (tm)
even though it came as a surprise to many it seems.

Now that we're having the discussion we can converge towards the best or
least controversial option.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r252209 - in head: share/man/man9 sys/kern sys/sys

2013-06-26 Thread Andre Oppermann

On 25.06.2013 20:44, John Baldwin wrote:

Author: jhb
Date: Tue Jun 25 18:44:15 2013
New Revision: 252209
URL: http://svnweb.freebsd.org/changeset/base/252209

Log:
   Several improvements to rmlock(9).  Many of these are based on patches
   provided by Isilon.
   - Add an rm_assert() supporting various lock assertions similar to other
 locking primitives.  Because rmlocks track readers the assertions are
 always fully accurate unlike rw_assert() and sx_assert().
   - Flesh out the lock class methods for rmlocks to support sleeping via
 condvars and rm_sleep() (but only while holding write locks), rmlock
 details in 'show lock' in DDB, and the lc_owner method used by
 dtrace.
   - Add an internal destroyed cookie so that API functions can assert
 that an rmlock is not destroyed.
   - Make use of rm_assert() to add various assertions to the API (e.g.
 to assert locks are held when an unlock routine is called).
   - Give RM_SLEEPABLE locks their own lock class and always use the
 rmlock's own lock_object with WITNESS.
   - Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
 while holding a read lock on an rmlock.


Thanks!

Would it make sense to move struct rm_queue from struct pcpu itself to
using DPCPU as a next step?


   Submitted by:andre


Actually these were only relayed by me and came from Max Laier / Stephan
Uphoff.  So all fame to them.


   Obtained from:   EMC/Isilon


--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r236959 - in head: share/man/man4 sys/netinet

2012-06-13 Thread Andre Oppermann

On 12.06.2012 16:02, Michael Tuexen wrote:

Author: tuexen
Date: Tue Jun 12 14:02:38 2012
New Revision: 236959
URL: http://svn.freebsd.org/changeset/base/236959

Log:
   Add a IP_RECVTOS socket option to receive for received UDP/IPv4
   packets a cmsg of type IP_RECVTOS which contains the TOS byte.
   Much like IP_RECVTTL does for TTL. This allows to implement a
   protocol on top of UDP and implementing ECN.


You may want to consider to alias IP_RECVTOS with IP_TOS as it is
done with IP_SENDSRCADDR+IP_RECVDSTADDR to allow for simpler replying
of received UDP packets.  That way IP_RECVTOS has the same ip socket
option number and it can be used for direct TOS reflection.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241686 - in head/sys: net netgraph netgraph/atm/ccatm netgraph/atm/sscfu netgraph/atm/sscop netgraph/atm/uni netinet netinet6 netipsec

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 13:57:24 2012
New Revision: 241686
URL: http://svn.freebsd.org/changeset/base/241686

Log:
  Mechanically remove the last stray remains of spl* calls from net*/*.
  They have been Noop's for a long time now.

Modified:
  head/sys/net/if.c
  head/sys/net/if_ef.c
  head/sys/net/if_gre.c
  head/sys/net/if_spppsubr.c
  head/sys/net/if_var.h
  head/sys/net/rtsock.c
  head/sys/netgraph/atm/ccatm/ng_ccatm.c
  head/sys/netgraph/atm/sscfu/ng_sscfu.c
  head/sys/netgraph/atm/sscop/ng_sscop.c
  head/sys/netgraph/atm/uni/ng_uni.c
  head/sys/netgraph/ng_eiface.c
  head/sys/netgraph/ng_ether.c
  head/sys/netgraph/ng_fec.c
  head/sys/netgraph/ng_gif.c
  head/sys/netgraph/ng_ksocket.c
  head/sys/netgraph/ng_source.c
  head/sys/netinet/ip_ipsec.c
  head/sys/netinet6/in6.c
  head/sys/netinet6/ip6_ipsec.c
  head/sys/netinet6/nd6.c
  head/sys/netinet6/nd6_nbr.c
  head/sys/netinet6/nd6_rtr.c
  head/sys/netinet6/udp6_usrreq.c
  head/sys/netipsec/key.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Thu Oct 18 13:46:26 2012(r241685)
+++ head/sys/net/if.c   Thu Oct 18 13:57:24 2012(r241686)
@@ -691,12 +691,9 @@ static void
 if_attachdomain(void *dummy)
 {
struct ifnet *ifp;
-   int s;
 
-   s = splnet();
TAILQ_FOREACH(ifp, &V_ifnet, if_link)
if_attachdomain1(ifp);
-   splx(s);
 }
 SYSINIT(domainifattach, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_SECOND,
 if_attachdomain, NULL);
@@ -705,21 +702,15 @@ static void
 if_attachdomain1(struct ifnet *ifp)
 {
struct domain *dp;
-   int s;
-
-   s = splnet();
 
/*
 * Since dp->dom_ifattach calls malloc() with M_WAITOK, we
 * cannot lock ifp->if_afdata initialization, entirely.
 */
-   if (IF_AFDATA_TRYLOCK(ifp) == 0) {
-   splx(s);
+   if (IF_AFDATA_TRYLOCK(ifp) == 0)
return;
-   }
if (ifp->if_afdata_initialized >= domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   splx(s);
printf("if_attachdomain called more than once on %s\n",
ifp->if_xname);
return;
@@ -734,8 +725,6 @@ if_attachdomain1(struct ifnet *ifp)
ifp->if_afdata[dp->dom_family] =
(*dp->dom_ifattach)(ifp);
}
-
-   splx(s);
 }
 
 /*
@@ -1825,7 +1814,6 @@ link_rtrequest(int cmd, struct rtentry *
 /*
  * Mark an interface down and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 static void
 if_unroute(struct ifnet *ifp, int flag, int fam)
@@ -1849,7 +1837,6 @@ if_unroute(struct ifnet *ifp, int flag, 
 /*
  * Mark an interface up and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 static void
 if_route(struct ifnet *ifp, int flag, int fam)
@@ -1935,7 +1922,6 @@ do_link_state_change(void *arg, int pend
 /*
  * Mark an interface down and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 void
 if_down(struct ifnet *ifp)
@@ -1947,7 +1933,6 @@ if_down(struct ifnet *ifp)
 /*
  * Mark an interface up and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 void
 if_up(struct ifnet *ifp)
@@ -2150,14 +2135,10 @@ ifhwioctl(u_long cmd, struct ifnet *ifp,
/* Smart drivers twiddle their own routes */
} else if (ifp->if_flags & IFF_UP &&
(new_flags & IFF_UP) == 0) {
-   int s = splimp();
if_down(ifp);
-   splx(s);
} else if (new_flags & IFF_UP &&
(ifp->if_flags & IFF_UP) == 0) {
-   int s = splimp();
if_up(ifp);
-   splx(s);
}
/* See if permanently promiscuous mode bit is about to flip */
if ((ifp->if_flags ^ new_flags) & IFF_PPROMISC) {
@@ -2605,11 +2586,8 @@ ifioctl(struct socket *so, u_long cmd, c
 
if ((oif_flags ^ ifp->if_flags) & IFF_UP) {
 #ifdef INET6
-   if (ifp->if_flags & IFF_UP) {
-   int s = splimp();
+   if (ifp->if_flags & IFF_UP)
in6_if_up(ifp);
-   splx(s);
-   }
 #endif
}
if_rele(ifp);

Modified: head/sys/net/if_ef.c
==
--- head/sys/net/if_ef.cThu Oct 18 13:46:26 2012(r241685)
+++ head/sys/net/if_ef.cThu Oct 18 13:57:24 2012(r241686)
@@ -151,14 +151,10 @@ static int
 ef_detach(struct efnet *sc)
 {
struct ifnet *ifp = sc->ef_ifp;
-   int s;
-
-   s = splimp();
 
ether_ifdetach(ifp);
if_free(ifp);
 
-   splx(s);
r

svn commit: r241688 - head/sys/net

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 14:08:26 2012
New Revision: 241688
URL: http://svn.freebsd.org/changeset/base/241688

Log:
  Use LOG_WARNING level in in_attachdomain1() instead of printf().
  
  Submitted by: vijju.singh-at-gmail.com

Modified:
  head/sys/net/if.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Thu Oct 18 13:57:28 2012(r241687)
+++ head/sys/net/if.c   Thu Oct 18 14:08:26 2012(r241688)
@@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
return;
if (ifp->if_afdata_initialized >= domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   printf("if_attachdomain called more than once on %s\n",
-   ifp->if_xname);
+   log(LOG_WARNING, "if_attachdomain called more than once "
+   "on %s\n", ifp->if_xname);
return;
}
ifp->if_afdata_initialized = domain_init_status;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
  Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
  zero copy specialized sosend_copyin() helper function.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Thu Oct 18 19:28:31 2012(r241702)
+++ head/sys/kern/uipc_socket.c Thu Oct 18 20:22:17 2012(r241703)
@@ -890,9 +890,7 @@ sosend_copyin(struct uio *uio, struct mb
long len;
ssize_t resid;
int error;
-#ifdef ZERO_COPY_SOCKETS
int cow_send;
-#endif
 
*retmp = top = NULL;
mp = ⊤
@@ -900,11 +898,8 @@ sosend_copyin(struct uio *uio, struct mb
resid = uio->uio_resid;
error = 0;
do {
-#ifdef ZERO_COPY_SOCKETS
cow_send = 0;
-#endif /* ZERO_COPY_SOCKETS */
if (resid >= MINCLSIZE) {
-#ifdef ZERO_COPY_SOCKETS
if (top == NULL) {
m = m_gethdr(M_WAITOK, MT_DATA);
m->m_pkthdr.len = 0;
@@ -924,15 +919,6 @@ sosend_copyin(struct uio *uio, struct mb
m_clget(m, M_WAITOK);
len = min(min(MCLBYTES, resid), *space);
}
-#else /* ZERO_COPY_SOCKETS */
-   if (top == NULL) {
-   m = m_getcl(M_WAIT, MT_DATA, M_PKTHDR);
-   m->m_pkthdr.len = 0;
-   m->m_pkthdr.rcvif = NULL;
-   } else
-   m = m_getcl(M_WAIT, MT_DATA, 0);
-   len = min(min(MCLBYTES, resid), *space);
-#endif /* ZERO_COPY_SOCKETS */
} else {
if (top == NULL) {
m = m_gethdr(M_WAIT, MT_DATA);
@@ -957,11 +943,9 @@ sosend_copyin(struct uio *uio, struct mb
}
 
*space -= len;
-#ifdef ZERO_COPY_SOCKETS
if (cow_send)
error = 0;
else
-#endif /* ZERO_COPY_SOCKETS */
error = uiomove(mtod(m, void *), (int)len, uio);
resid = uio->uio_resid;
m->m_len = len;
@@ -980,7 +964,7 @@ out:
*retmp = top;
return (error);
 }
-#endif /*ZERO_COPY_SOCKETS*/
+#endif /* ZERO_COPY_SOCKETS */
 
 #defineSBLOCKWAIT(f)   (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT)
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.
Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.

Note that zero copy isn't entirely true either as it marks
the page as COW.  So when the userspace application reuses
the memory it is copied anyway.  Also the overhead of doing
the VM magic and mbuf attachment of a VM page isn't free
either.  To really benefit from it an application has to be
written with COW in mind and not reuse the memory that was
just written to the socket.  For non-aware applications it
may be a net performance loss overall.

Also I don't like the name zero-copy-socket as it promises
too much for those not into socket, mbuf and VM magic.
I'd rather call it cow-socket or something like that as it
describes much better what is actually happening behind the
scenes.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241704 - head/sys/kern

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 21:04:30 2012
New Revision: 241704
URL: http://svn.freebsd.org/changeset/base/241704

Log:
  Remove unnecessary includes from sosend_copyin() and fix
  a couple of style issues.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Thu Oct 18 20:22:17 2012(r241703)
+++ head/sys/kern/uipc_socket.c Thu Oct 18 21:04:30 2012(r241704)
@@ -860,12 +860,6 @@ struct so_zerocopy_stats{
int found_ifp;
 };
 struct so_zerocopy_stats so_zerocp_stats = {0,0,0};
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 
 /*
  * sosend_copyin() is only used if zero copy sockets are enabled.  Otherwise
@@ -907,9 +901,9 @@ sosend_copyin(struct uio *uio, struct mb
} else
m = m_get(M_WAITOK, MT_DATA);
if (so_zero_copy_send &&
-   resid>=PAGE_SIZE &&
-   *space>=PAGE_SIZE &&
-   uio->uio_iov->iov_len>=PAGE_SIZE) {
+   resid >= PAGE_SIZE &&
+   *space >= PAGE_SIZE &&
+   uio->uio_iov->iov_len >= PAGE_SIZE) {
so_zerocp_stats.size_ok++;
so_zerocp_stats.align_ok++;
cow_send = socow_setup(m, uio);
@@ -946,7 +940,7 @@ sosend_copyin(struct uio *uio, struct mb
if (cow_send)
error = 0;
else
-   error = uiomove(mtod(m, void *), (int)len, uio);
+   error = uiomove(mtod(m, void *), (int)len, uio);
resid = uio->uio_resid;
m->m_len = len;
*mp = m;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 23:06, Navdeep Parhar wrote:

Hello Andre,

A couple of things if you're poking around in this area...


I didn't really mean to dive too deep into COW socket writes.


On 10/18/12 13:44, Andre Oppermann wrote:

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.


Some time back I played around with a modified m_uiotombuf() that was aware of 
the mbuf_jumbo_16K
zone (instead of limiting itself to 4K mbufs).  In some cases it performed 
better than the stock
m_uiotombuf. I suspect this change would also help drivers that are unable to 
deal with long gather
lists when doing TSO.  But my testing wasn't rigorous enough (I was merely 
playing around), and the
drivers I work with can mostly cope with whatever the kernel throws at them.  
So nothing came out of
it.


The jumbo 16K zone is special in that the memory is actually allocated
by contigmalloc to get physically contiguous RAM. After some uptime and
heavy use this may become difficult to obtain. Also contigmalloc has to
hunt for it which may cause quite a bit of overhead.

4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast.

To be honest I'm not really happy about > PAGE_SIZE mbufs.  They were
introduced at a time when DMA engines were more limited and couldn't
do S/G DMA on receive.

So performance with > PAGE_SIZE mbufs may be a little bit better but
when you approach memory fragmentation after some heavy system usage
it sucks up to the point where it fails most of the time.  PAGE_SIZE
mbufs always perform the same with very little deviation.

In an ideal scenario I'd like to see 9K and 16K mbufs go away and
have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs.


Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.


I have a cxgbe(4)-specific true zero-copy implementation.  The rx side is in 
head, the tx side works
only for blocking sockets (the "easy" case) and I haven't checked it in 
anywhere.  Take a look at
t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c. They're 
mostly identical to the
kernel routines they're based on (read: copy-pasted from).  You may find them 
of some interest if
you're working in this area and are thinking of adding zero-copy hooks to the 
socket implementation.


I'm going to have a look at it think about how to generically support
DDP either way with our socket buffer layout.

Actually that may end up as the golden path. Do away with > PAGE_SIZE
mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use
DDP for those who need utmost performance (as I said only COW aware
applications gain a bit of speed, unaware may end up much worse).

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241724 - head/sys/sys

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:04:43 2012
New Revision: 241724
URL: http://svn.freebsd.org/changeset/base/241724

Log:
  Remove splimp() comment from sysinit table and attribute SI_SUB_PROTO_BEGIN
  and SI_SUB_PROTO_END to VNET related initializations.
  
  MFC after:3 days

Modified:
  head/sys/sys/kernel.h

Modified: head/sys/sys/kernel.h
==
--- head/sys/sys/kernel.h   Fri Oct 19 09:41:45 2012(r241723)
+++ head/sys/sys/kernel.h   Fri Oct 19 10:04:43 2012(r241724)
@@ -84,12 +84,6 @@ extern int ticks;
  * The SI_SUB_SWAP values represent a value used by
  * the BSD 4.4Lite but not by FreeBSD; it is maintained in dependent
  * order to support porting.
- *
- * The SI_SUB_PROTO_BEGIN and SI_SUB_PROTO_END bracket a range of
- * initializations to take place at splimp().  This is a historical
- * wart that should be removed -- probably running everything at
- * splimp() until the first init that doesn't want it is the correct
- * fix.  They are currently present to ensure historical behavior.
  */
 enum sysinit_sub_id {
SI_SUB_DUMMY= 0x000,/* not executed; for linker*/
@@ -147,12 +141,12 @@ enum sysinit_sub_id {
SI_SUB_P1003_1B = 0x6E0,/* P1003.1B realtime */
SI_SUB_PSEUDO   = 0x700,/* pseudo devices*/
SI_SUB_EXEC = 0x740,/* execve() handlers */
-   SI_SUB_PROTO_BEGIN  = 0x800,/* XXX: set splimp (kludge)*/
+   SI_SUB_PROTO_BEGIN  = 0x800,/* VNET initialization */
SI_SUB_PROTO_IF = 0x840,/* interfaces*/
SI_SUB_PROTO_DOMAININIT = 0x860,/* domain registration system */
SI_SUB_PROTO_DOMAIN = 0x880,/* domains (address families?)*/
SI_SUB_PROTO_IFATTACHDOMAIN = 0x881,/* domain dependent 
data init*/
-   SI_SUB_PROTO_END= 0x8ff,/* XXX: set splx (kludge)*/
+   SI_SUB_PROTO_END= 0x8ff,/* VNET helper functions */
SI_SUB_KPROF= 0x900,/* kernel profiling*/
SI_SUB_KICK_SCHEDULER   = 0xa00,/* start the timeout events*/
SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241725 - head/sys/net

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:07:55 2012
New Revision: 241725
URL: http://svn.freebsd.org/changeset/base/241725

Log:
  Update to previous r241688 to use __func__ instead of spelled out function
  name in log(9) message.
  
  Suggested by: glebius

Modified:
  head/sys/net/if.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Fri Oct 19 10:04:43 2012(r241724)
+++ head/sys/net/if.c   Fri Oct 19 10:07:55 2012(r241725)
@@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
return;
if (ifp->if_afdata_initialized >= domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   log(LOG_WARNING, "if_attachdomain called more than once "
-   "on %s\n", ifp->if_xname);
+   log(LOG_WARNING, "%s called more than once on %s\n",
+   __func__, ifp->if_xname);
return;
}
ifp->if_afdata_initialized = domain_init_status;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241688 - head/sys/net

2012-10-19 Thread Andre Oppermann

On 18.10.2012 16:11, Gleb Smirnoff wrote:

On Thu, Oct 18, 2012 at 02:08:26PM +, Andre Oppermann wrote:
A> Author: andre
A> Date: Thu Oct 18 14:08:26 2012
A> New Revision: 241688
A> URL: http://svn.freebsd.org/changeset/base/241688
A>
A> Log:
A>   Use LOG_WARNING level in in_attachdomain1() instead of printf().
A>
A>   Submitted by:   vijju.singh-at-gmail.com
A>
A> Modified:
A>   head/sys/net/if.c
A>
A> Modified: head/sys/net/if.c
A> 
==
A> --- head/sys/net/if.c Thu Oct 18 13:57:28 2012(r241687)
A> +++ head/sys/net/if.c Thu Oct 18 14:08:26 2012(r241688)
A> @@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
A>   return;
A>   if (ifp->if_afdata_initialized >= domain_init_status) {
A>   IF_AFDATA_UNLOCK(ifp);
A> - printf("if_attachdomain called more than once on %s\n",
A> - ifp->if_xname);
A> + log(LOG_WARNING, "if_attachdomain called more than once "
A> + "on %s\n", ifp->if_xname);
A>   return;
A>   }
A>   ifp->if_afdata_initialized = domain_init_status;

It'll be even more perfect if done as

"%s called more than once on %s\n", __func__, ifp->if_xname


Thanks, done in r241725.


And do we need "\n" for log(9)?


Yes.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241726 - head/sys/kern

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:15:32 2012
New Revision: 241726
URL: http://svn.freebsd.org/changeset/base/241726

Log:
  Move UMA socket zone initialization from uipc_domain.c to uipc_socket.c
  into one place next to its other related functions to avoid confusion.

Modified:
  head/sys/kern/uipc_domain.c
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_domain.c
==
--- head/sys/kern/uipc_domain.c Fri Oct 19 10:07:55 2012(r241725)
+++ head/sys/kern/uipc_domain.c Fri Oct 19 10:15:32 2012(r241726)
@@ -239,28 +239,11 @@ domain_add(void *data)
mtx_unlock(&dom_mtx);
 }
 
-static void
-socket_zone_change(void *tag)
-{
-
-   uma_zone_set_max(socket_zone, maxsockets);
-}
-
 /* ARGSUSED*/
 static void
 domaininit(void *dummy)
 {
 
-   /*
-* Before we do any setup, make sure to initialize the
-* zone allocator we get struct sockets from.
-*/
-   socket_zone = uma_zcreate("socket", sizeof(struct socket), NULL, NULL,
-   NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
-   uma_zone_set_max(socket_zone, maxsockets);
-   EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL,
-   EVENTHANDLER_PRI_FIRST);
-
if (max_linkhdr < 16)   /* XXX */
max_linkhdr = 16;
 

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Fri Oct 19 10:07:55 2012(r241725)
+++ head/sys/kern/uipc_socket.c Fri Oct 19 10:15:32 2012(r241726)
@@ -227,6 +227,29 @@ MTX_SYSINIT(so_global_mtx, &so_global_mt
 SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC");
 
 /*
+ * Initialize the socket subsystem and set up the socket
+ * memory allocator.
+ */
+static void
+socket_zone_change(void *tag)
+{
+
+   uma_zone_set_max(socket_zone, maxsockets);
+}
+
+static void
+socket_init(void *tag)
+{
+
+socket_zone = uma_zcreate("socket", sizeof(struct socket), NULL, NULL,
+NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
+uma_zone_set_max(socket_zone, maxsockets);
+EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL,
+EVENTHANDLER_PRI_FIRST);
+}
+SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL);
+
+/*
  * Sysctl to get and set the maximum global sockets limit.  Notify protocols
  * of the change so that they can update their dependent limits as required.
  */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241729 - head/sys/kern

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 12:16:29 2012
New Revision: 241729
URL: http://svn.freebsd.org/changeset/base/241729

Log:
  Move socket UMA zone initialization functionality together into
  one place.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Fri Oct 19 11:01:39 2012(r241728)
+++ head/sys/kern/uipc_socket.c Fri Oct 19 12:16:29 2012(r241729)
@@ -173,11 +173,8 @@ static struct filterops sowrite_filtops 
.f_event = filt_sowrite,
 };
 
-uma_zone_t socket_zone;
 so_gen_t   so_gencnt;  /* generation count for sockets */
 
-intmaxsockets;
-
 MALLOC_DEFINE(M_SONAME, "soname", "socket name");
 MALLOC_DEFINE(M_PCB, "pcb", "protocol control block");
 
@@ -230,6 +227,9 @@ SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLA
  * Initialize the socket subsystem and set up the socket
  * memory allocator.
  */
+uma_zone_t socket_zone;
+intmaxsockets;
+
 static void
 socket_zone_change(void *tag)
 {
@@ -250,6 +250,19 @@ socket_init(void *tag)
 SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL);
 
 /*
+ * Initialise maxsockets.  This SYSINIT must be run after
+ * tunable_mbinit().
+ */
+static void
+init_maxsockets(void *ignored)
+{
+
+   TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets);
+   maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters));
+}
+SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL);
+
+/*
  * Sysctl to get and set the maximum global sockets limit.  Notify protocols
  * of the change so that they can update their dependent limits as required.
  */
@@ -273,25 +286,11 @@ sysctl_maxsockets(SYSCTL_HANDLER_ARGS)
}
return (error);
 }
-
 SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT|CTLFLAG_RW,
 &maxsockets, 0, sysctl_maxsockets, "IU",
 "Maximum number of sockets avaliable");
 
 /*
- * Initialise maxsockets.  This SYSINIT must be run after
- * tunable_mbinit().
- */
-static void
-init_maxsockets(void *ignored)
-{
-
-   TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets);
-   maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters));
-}
-SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL);
-
-/*
  * Socket operation routines.  These routines are called by the routines in
  * sys_socket.c or from a system process, and implement the semantics of
  * socket operations by switching out to the protocol specific routines.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241779 - head/sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 10:51:32 2012
New Revision: 241779
URL: http://svn.freebsd.org/changeset/base/241779

Log:
  Tidy up somaxconn (accept queue limit) and related functions
  and move it together into one place.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 10:34:55 2012(r241778)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 10:51:32 2012(r241779)
@@ -182,15 +182,37 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co
VNET_ASSERT(curvnet != NULL,\
("%s:%d curvnet is NULL, so=%p", __func__, __LINE__, (so)));
 
+/*
+ * Limit on the number of connections in the listen queue waiting
+ * for accept(2).
+ */
 static int somaxconn = SOMAXCONN;
-static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS);
-/* XXX: we dont have SYSCTL_USHORT */
+
+static int
+sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
+{
+   int error;
+   int val;
+
+   val = somaxconn;
+   error = sysctl_handle_int(oidp, &val, 0, req);
+   if (error || !req->newptr )
+   return (error);
+
+   if (val < 1 || val > USHRT_MAX)
+   return (EINVAL);
+
+   somaxconn = val;
+   return (0);
+}
 SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
-0, sizeof(int), sysctl_somaxconn, "I", "Maximum pending socket connection "
-"queue size");
+0, sizeof(int), sysctl_somaxconn, "I",
+"Maximum listen socket pending connection accept queue size");
+
 static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
 &numopensockets, 0, "Number of open sockets");
+
 #ifdef ZERO_COPY_SOCKETS
 /* These aren't static because they're used in other files. */
 int so_zero_copy_send = 1;
@@ -3269,24 +3291,6 @@ socheckuid(struct socket *so, uid_t uid)
return (0);
 }
 
-static int
-sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
-{
-   int error;
-   int val;
-
-   val = somaxconn;
-   error = sysctl_handle_int(oidp, &val, 0, req);
-   if (error || !req->newptr )
-   return (error);
-
-   if (val < 1 || val > USHRT_MAX)
-   return (EINVAL);
-
-   somaxconn = val;
-   return (0);
-}
-
 /*
  * These functions are used by protocols to notify the socket layer (and its
  * consumers) of state changes in the sockets driven by protocol-side events.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241781 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 12:53:14 2012
New Revision: 241781
URL: http://svn.freebsd.org/changeset/base/241781

Log:
  Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
  output and replace it with a new visible sysctl kern.ipc.acceptqueue
  of the same functionality.  It specifies the maximum length of the
  accept queue on a listen socket.
  
  The old kern.ipc.somaxconn remains available for reading and writing
  for compatibility reasons so that existing programs, scripts and
  configurations continue to work.  There no plans to ever remove the
  orginal and now hidden kern.ipc.somaxconn.

Modified:
  head/lib/libc/sys/listen.2
  head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2  Sat Oct 20 12:07:48 2012(r241780)
+++ head/lib/libc/sys/listen.2  Sat Oct 20 12:53:14 2012(r241781)
@@ -28,7 +28,7 @@
 .\"From: @(#)listen.2  8.2 (Berkeley) 12/11/93
 .\" $FreeBSD$
 .\"
-.Dd August 29, 2005
+.Dd October 20, 2012
 .Dt LISTEN 2
 .Os
 .Sh NAME
@@ -102,15 +102,15 @@ of service attacks are no longer necessa
 The
 .Xr sysctl 3
 MIB variable
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
 specifies a hard limit on
 .Fa backlog ;
 if a value greater than
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
 or less than zero is specified,
 .Fa backlog
 is silently forced to
-.Va kern.ipc.somaxconn .
+.Va kern.ipc.soacceptqueue .
 .Sh INTERACTION WITH ACCEPT FILTERS
 When accept filtering is used on a socket, a second queue will
 be used to hold sockets that have connected, but have not yet
@@ -168,3 +168,17 @@ at run-time, and to use a negative
 .Fa backlog
 to request the maximum allowable value, was introduced in
 .Fx 2.2 .
+The
+.Va kern.ipc.somaxconn
+.Xr sysctl 3
+has been replaced with
+.Va kern.ipc.soacceptqueue
+in
+.Fx 10.0
+to prevent confusion its actual functionality.
+The original
+.Xr sysctl 3
+.Va kern.ipc.somaxconn
+is still available but hidden from a
+.Xr sysctl 3
+-a output so that existing applications and scripts continue to work.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 12:07:48 2012(r241780)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 12:53:14 2012(r241781)
@@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co
 /*
  * Limit on the number of connections in the listen queue waiting
  * for accept(2).
+ * NB: The orginal sysctl somaxconn is still available but hidden
+ * to prevent confusion about the actually purpose of this number.
  */
 static int somaxconn = SOMAXCONN;
 
@@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
somaxconn = val;
return (0);
 }
-SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
+SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW,
 0, sizeof(int), sysctl_somaxconn, "I",
 "Maximum listen socket pending connection accept queue size");
+SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn,
+CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP,
+0, sizeof(int), sysctl_somaxconn, "I",
+"Maximum listen socket pending connection accept queue size (compat)");
 
 static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241789 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 19:38:22 2012
New Revision: 241789
URL: http://svn.freebsd.org/changeset/base/241789

Log:
  Grammar fixes to r241781.
  
  Submitted by: alc

Modified:
  head/lib/libc/sys/listen.2
  head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2  Sat Oct 20 18:13:20 2012(r241788)
+++ head/lib/libc/sys/listen.2  Sat Oct 20 19:38:22 2012(r241789)
@@ -175,7 +175,7 @@ has been replaced with
 .Va kern.ipc.soacceptqueue
 in
 .Fx 10.0
-to prevent confusion its actual functionality.
+to prevent confusion about its actual functionality.
 The original
 .Xr sysctl 3
 .Va kern.ipc.somaxconn

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 18:13:20 2012(r241788)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 19:38:22 2012(r241789)
@@ -186,7 +186,7 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co
  * Limit on the number of connections in the listen queue waiting
  * for accept(2).
  * NB: The orginal sysctl somaxconn is still available but hidden
- * to prevent confusion about the actually purpose of this number.
+ * to prevent confusion about the actual purpose of this number.
  */
 static int somaxconn = SOMAXCONN;
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241781 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann

On 20.10.2012 19:23, Alan Cox wrote:

There are couple minor grammar issues in the text.  See below.


Thank you. Fixed in r241789.

--
Andre


Alan

On 10/20/2012 07:53, Andre Oppermann wrote:

Author: andre
Date: Sat Oct 20 12:53:14 2012
New Revision: 241781
URL: http://svn.freebsd.org/changeset/base/241781

Log:
   Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
   output and replace it with a new visible sysctl kern.ipc.acceptqueue
   of the same functionality.  It specifies the maximum length of the
   accept queue on a listen socket.

   The old kern.ipc.somaxconn remains available for reading and writing
   for compatibility reasons so that existing programs, scripts and
   configurations continue to work.  There no plans to ever remove the
   orginal and now hidden kern.ipc.somaxconn.

Modified:
   head/lib/libc/sys/listen.2
   head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2Sat Oct 20 12:07:48 2012(r241780)
+++ head/lib/libc/sys/listen.2Sat Oct 20 12:53:14 2012(r241781)
@@ -28,7 +28,7 @@
  .\"From: @(#)listen.28.2 (Berkeley) 12/11/93
  .\" $FreeBSD$
  .\"
-.Dd August 29, 2005
+.Dd October 20, 2012
  .Dt LISTEN 2
  .Os
  .Sh NAME
@@ -102,15 +102,15 @@ of service attacks are no longer necessa
  The
  .Xr sysctl 3
  MIB variable
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
  specifies a hard limit on
  .Fa backlog ;
  if a value greater than
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
  or less than zero is specified,
  .Fa backlog
  is silently forced to
-.Va kern.ipc.somaxconn .
+.Va kern.ipc.soacceptqueue .
  .Sh INTERACTION WITH ACCEPT FILTERS
  When accept filtering is used on a socket, a second queue will
  be used to hold sockets that have connected, but have not yet
@@ -168,3 +168,17 @@ at run-time, and to use a negative
  .Fa backlog
  to request the maximum allowable value, was introduced in
  .Fx 2.2 .
+The
+.Va kern.ipc.somaxconn
+.Xr sysctl 3
+has been replaced with
+.Va kern.ipc.soacceptqueue
+in
+.Fx 10.0
+to prevent confusion its actual functionality.


There is a missing word here: "... confusion about its ..."


+The original
+.Xr sysctl 3
+.Va kern.ipc.somaxconn
+is still available but hidden from a
+.Xr sysctl 3
+-a output so that existing applications and scripts continue to work.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.cSat Oct 20 12:07:48 2012(r241780)
+++ head/sys/kern/uipc_socket.cSat Oct 20 12:53:14 2012(r241781)
@@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, "pcb", "protocol co
  /*
   * Limit on the number of connections in the listen queue waiting
   * for accept(2).
+ * NB: The orginal sysctl somaxconn is still available but hidden
+ * to prevent confusion about the actually purpose of this number.


"actually" should be "actual".


   */
  static int somaxconn = SOMAXCONN;

@@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
  somaxconn = val;
  return (0);
  }
-SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
+SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW,
  0, sizeof(int), sysctl_somaxconn, "I",
  "Maximum listen socket pending connection accept queue size");
+SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn,
+CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP,
+0, sizeof(int), sysctl_somaxconn, "I",
+"Maximum listen socket pending connection accept queue size (compat)");

  static int numopensockets;
  SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,







___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241892 - head/sys/mips/conf

2012-10-22 Thread Andre Oppermann
Author: andre
Date: Mon Oct 22 15:04:23 2012
New Revision: 241892
URL: http://svn.freebsd.org/changeset/base/241892

Log:
  Remove ZERO_COPY_SOCKETS from kernel configuration as the current
  COW based approach is not safe and should not be used in production.

Modified:
  head/sys/mips/conf/RT305X

Modified: head/sys/mips/conf/RT305X
==
--- head/sys/mips/conf/RT305X   Mon Oct 22 14:48:14 2012(r241891)
+++ head/sys/mips/conf/RT305X   Mon Oct 22 15:04:23 2012(r241892)
@@ -86,7 +86,6 @@ options   SCSI_NO_OP_STRINGS
 optionsRWLOCK_NOINLINE
 optionsSX_NOINLINE
 optionsNO_SWAPPING
-optionsZERO_COPY_SOCKETS
 options MROUTING# Multicast routing
 optionsIPFIREWALL_DEFAULT_TO_ACCEPT
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241923 - in head/sys: netinet netipsec

2012-10-23 Thread Andre Oppermann

On 23.10.2012 10:33, Gleb Smirnoff wrote:

Author: glebius
Date: Tue Oct 23 08:33:13 2012
New Revision: 241923
URL: http://svn.freebsd.org/changeset/base/241923

Log:
 Do not reduce ip_len by size of IP header in the ip_input()
   before passing a packet to protocol input routines.
 For several protocols this mean that now protocol needs to
   do subtraction itself, and for another half this means that
   we do not need to add header length back to the packet.


Yay! More Mammoth shit getting washed away! ;)

Please add an entry to UPDATING as the convention of of ip_len
subtraction has been there since forever. That makes it easier
to discover for third parties writing code.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 14:19:44 2012
New Revision: 241931
URL: http://svn.freebsd.org/changeset/base/241931

Log:
  Replace the ill-named ZERO_COPY_SOCKET kernel option with two
  more appropriate named kernel options for the very distinct
  send and receive path.
  
  "options SOCKET_SEND_COW" enables VM page copy-on-write based
  sending of data on an outbound socket.
  
  NB: The COW based send mechanism is not safe and may result
  in kernel crashes.
  
  "options SOCKET_RECV_PFLIP" enables VM kernel/userspace page
  flipping for special disposable pages attached as external
  storage to mbufs.
  
  Only the naming of the kernel options is changed and their
  corresponding #ifdef sections are adjusted.  No functionality
  is added or removed.
  
  Discussed with:   alc (mechanism and limitations of send side COW)

Modified:
  head/sys/conf/NOTES
  head/sys/conf/options
  head/sys/kern/subr_uio.c
  head/sys/kern/uipc_socket.c

Modified: head/sys/conf/NOTES
==
--- head/sys/conf/NOTES Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/conf/NOTES Tue Oct 23 14:19:44 2012(r241931)
@@ -964,12 +964,20 @@ options   TCP_SIGNATURE   #include support
 # a smooth scheduling of the traffic.
 optionsDUMMYNET
 
-# Zero copy sockets support.  This enables "zero copy" for sending and
-# receiving data via a socket.  The send side works for any type of NIC,
-# the receive side only works for NICs that support MTUs greater than the
-# page size of your architecture and that support header splitting.  See
-# zero_copy(9) for more details.
-optionsZERO_COPY_SOCKETS
+# "Zero copy" sockets support is split into the send and receive path
+# which operate very differently.
+# For the send path the VM page with the data is wired into the kernel
+# and marked as COW (copy-on-write).  If the application touches the
+# data while it is still in the send socket buffer the page is copied
+# and divorced from its kernel wiring (no longer zero copy).
+# The receive side requires explicit NIC driver support to create
+# disposable pages which are flipped from kernel to user-space VM.
+# See zero_copy(9) for more details.
+# XXX: The COW based send mechanism is not safe and may result in
+# kernel crashes.
+# XXX: None of the current NIC drivers support disposeable pages.
+optionsSOCKET_SEND_COW
+optionsSOCKET_RECV_PFLIP
 
 #
 # FILESYSTEM OPTIONS

Modified: head/sys/conf/options
==
--- head/sys/conf/options   Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/conf/options   Tue Oct 23 14:19:44 2012(r241931)
@@ -520,7 +520,8 @@ NGATM_CCATM opt_netgraph.h
 # DRM options
 DRM_DEBUG  opt_drm.h
 
-ZERO_COPY_SOCKETS  opt_zero.h
+SOCKET_SEND_COWopt_zero.h
+SOCKET_RECV_PFLIP  opt_zero.h
 TI_SF_BUF_JUMBOopt_ti.h
 TI_JUMBO_HDRSPLIT  opt_ti.h
 BCE_JUMBO_HDRSPLIT opt_bce.h

Modified: head/sys/kern/subr_uio.c
==
--- head/sys/kern/subr_uio.cTue Oct 23 12:39:17 2012(r241930)
+++ head/sys/kern/subr_uio.cTue Oct 23 14:19:44 2012(r241931)
@@ -57,7 +57,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_SEND_COW
 #include 
 #endif
 
@@ -66,7 +66,7 @@ SYSCTL_INT(_kern, KERN_IOV_MAX, iov_max,
 
 static int uiomove_faultflag(void *cp, int n, struct uio *uio, int nofault);
 
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_SEND_COW
 /* Declared in uipc_socket.c */
 extern int so_zero_copy_receive;
 
@@ -128,7 +128,7 @@ retry:
vm_map_lookup_done(map, entry);
return(KERN_SUCCESS);
 }
-#endif /* ZERO_COPY_SOCKETS */
+#endif /* SOCKET_SEND_COW */
 
 int
 copyin_nofault(const void *udaddr, void *kaddr, size_t len)
@@ -261,7 +261,7 @@ uiomove_frombuf(void *buf, int buflen, s
return (uiomove((char *)buf + offset, n, uio));
 }
 
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_RECV_PFLIP
 /*
  * Experimental support for zero-copy I/O
  */
@@ -356,7 +356,7 @@ uiomoveco(void *cp, int n, struct uio *u
}
return (0);
 }
-#endif /* ZERO_COPY_SOCKETS */
+#endif /* SOCKET_RECV_PFLIP */
 
 /*
  * Give next character to user as result of read.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/kern/uipc_socket.c Tue Oct 23 14:19:44 2012(r241931)
@@ -219,17 +219,20 @@ static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
 &numopensockets, 0, "Number of open sockets");
 
-#ifdef ZERO_COPY_SOCKETS
-/* These aren't static because th

svn commit: r241932 - head/share/man/man9

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 14:25:37 2012
New Revision: 241932
URL: http://svn.freebsd.org/changeset/base/241932

Log:
  Update zero_copy(9) man page to note the renamed kernel options
  and to warn about unsafeness of COW based sends.

Modified:
  head/share/man/man9/zero_copy.9

Modified: head/share/man/man9/zero_copy.9
==
--- head/share/man/man9/zero_copy.9 Tue Oct 23 14:19:44 2012
(r241931)
+++ head/share/man/man9/zero_copy.9 Tue Oct 23 14:25:37 2012
(r241932)
@@ -25,7 +25,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd December 5, 2004
+.Dd October 23, 2012
 .Dt ZERO_COPY 9
 .Os
 .Sh NAME
@@ -33,7 +33,8 @@
 .Nm zero_copy_sockets
 .Nd "zero copy sockets code"
 .Sh SYNOPSIS
-.Cd "options ZERO_COPY_SOCKETS"
+.Cd "options SOCKET_SEND_COW"
+.Cd "options SOCKET_RECV_PFLIP"
 .Sh DESCRIPTION
 The
 .Fx
@@ -155,6 +156,8 @@ variables respectively.
 .Xr sendfile 2 ,
 .Xr socket 2 ,
 .Xr ti 4
+.Sh BUGS
+The COW based send mechanism is not safe and may result in kernel crashes.
 .Sh HISTORY
 The zero copy sockets code first appeared in
 .Fx 5.0 ,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 16:42, Gleb Smirnoff wrote:

On Tue, Oct 23, 2012 at 02:19:45PM +, Andre Oppermann wrote:
A> Author: andre
A> Date: Tue Oct 23 14:19:44 2012
A> New Revision: 241931
A> URL: http://svn.freebsd.org/changeset/base/241931
A>
A> Log:
A>   Replace the ill-named ZERO_COPY_SOCKET kernel option with two
A>   more appropriate named kernel options for the very distinct
A>   send and receive path.
A>
A>   "options SOCKET_SEND_COW" enables VM page copy-on-write based
A>   sending of data on an outbound socket.
A>
A>   NB: The COW based send mechanism is not safe and may result
A>   in kernel crashes.
A>
A>   "options SOCKET_RECV_PFLIP" enables VM kernel/userspace page
A>   flipping for special disposable pages attached as external
A>   storage to mbufs.
A>
A>   Only the naming of the kernel options is changed and their
A>   corresponding #ifdef sections are adjusted.  No functionality
A>   is added or removed.
A>
A>   Discussed with: alc (mechanism and limitations of send side COW)

Users may call this a pointless POLA violation. IMO, the old
kernel option that we had for years, more than a decade, should remain
and just imply two new kernel options.


There shouldn't be any users.  Zero copy send is broken and
responsible for random kernel crashes.  Zero copy receive isn't
supported by any modern driver.  Both are useless to dangerous.

The main problem with ZERO_COPY_SOCKETS was that it sounded great
and who wouldn't want to have zero copy sockets?  Unfortunately
it doesn't work that way.

According to alc@ even if zero copy send would work it wouldn't
be faster due to page based COW setup being a very expensive
operation.  Eventually he want's page-based COW to go away.

For zero copy send we're trying to come up with a sendfile-like
approach where the page is simply wired into kernel space.  The
application then is not allowed to touch it until the socket
buffer has released it again.  The main issue here is how to
provide feedback to the application when it is safe for reuse.

For zero copy receive I've been contacted by np@ to find a way
to combine DDP into the socket buffer layer.  Trying to work
something out that isn't too horrible.  A generic approach would
hinge on page sized mbufs though.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 17:11, David Chisnall wrote:

On 23 Oct 2012, at 16:05, Andre Oppermann wrote:


For zero copy send we're trying to come up with a sendfile-like
approach where the page is simply wired into kernel space.  The
application then is not allowed to touch it until the socket
buffer has released it again.  The main issue here is how to
provide feedback to the application when it is safe for reuse.


It's been a few years since I used it, but I thought that aio_write() already 
provided this.  The application may not modify the contents of the memory 
pointed to by aio_buf until after it has received notification that the write 
has finished.  This happens either via a signal directly, a signal polled by 
kqueue, or a call to aio_return().


Indeed, that's one of the ways being explored.  It requires the
explicit cooperation of the application.  I don't think there is
any way around that.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241955 - head

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 16:33:43 2012
New Revision: 241955
URL: http://svn.freebsd.org/changeset/base/241955

Log:
  Note the removal of the ZERO_COPY_SOCKETS kernel option in r241931
  and provide a proper explanation.

Modified:
  head/UPDATING

Modified: head/UPDATING
==
--- head/UPDATING   Tue Oct 23 16:12:17 2012(r241954)
+++ head/UPDATING   Tue Oct 23 16:33:43 2012(r241955)
@@ -25,6 +25,17 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10
"ln -s 'abort:false,junk:false' /etc/malloc.conf".)
 
 20121023:
+   The ZERO_COPY_SOCKET kernel option has been removed and
+   split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP.
+   NB: SOCKET_SEND_COW uses the VM page based copy-on-write
+   mechanism which is not safe and may result in kernel crashes.
+   NB: The SOCKET_RECV_PFLIP mechanism is useless as no current
+   driver supports disposeable external page sized mbuf storage.
+   Proper replacements for both zero-copy mechanisms are under
+   consideration and will eventually lead to complete removal
+   of the two kernel options.
+
+20121023:
The IPv4 network stack has been converted to network byte
order. The following modules need to be recompiled together
with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4),
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 17:21, Bryan Drewery wrote:

On 10/23/2012 10:05 AM, Andre Oppermann wrote:

There shouldn't be any users.  Zero copy send is broken and
responsible for random kernel crashes.  Zero copy receive isn't
supported by any modern driver.  Both are useless to dangerous.


I enabled this a few weeks ago, not knowing it was useless/dangerous.

Perhaps an entry in UPDATING to note that this has been renamed and that
it may not actually be useful?


Good idea.  Will do.


Also, zero_copy(9) needs updating, as it references ZERO_COPY_SOCKETS.


Already done in r241932.

--
Andre


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 18:05, Gleb Smirnoff wrote:

On Tue, Oct 23, 2012 at 05:05:48PM +0200, Andre Oppermann wrote:
A> There shouldn't be any users.  Zero copy send is broken and
A> responsible for random kernel crashes.  Zero copy receive isn't
A> supported by any modern driver.  Both are useless to dangerous.
A>
A> The main problem with ZERO_COPY_SOCKETS was that it sounded great
A> and who wouldn't want to have zero copy sockets?  Unfortunately
A> it doesn't work that way.

Okay, it appeared that there are users, even on current@ mailing
list during couple of hours of exposition.

Can we keep the old option as compatibility?


No.  They are not users.  They simply fell for the promise of
"zero copy" which it isn't.  It doesn't do what the "users"
believe it does.  It's useless for receive and dangerous for send.

I have updated NOTES and forwarded it to -current.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r241971 - head/sys/conf

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 23:13:44 2012
New Revision: 241971
URL: http://svn.freebsd.org/changeset/base/241971

Log:
  Change the dependency of kern/uipc_cow.c from zero_copy_sockets
  to socket_send_cow.  Missed in r241931.
  
  Submitted by: pluknet

Modified:
  head/sys/conf/files

Modified: head/sys/conf/files
==
--- head/sys/conf/files Tue Oct 23 22:58:25 2012(r241970)
+++ head/sys/conf/files Tue Oct 23 23:13:44 2012(r241971)
@@ -2691,7 +2691,7 @@ kern/tty_pts.cstandard
 kern/tty_tty.c standard
 kern/tty_ttydisc.c standard
 kern/uipc_accf.c   optional inet
-kern/uipc_cow.coptional zero_copy_sockets
+kern/uipc_cow.coptional socket_send_cow
 kern/uipc_debug.c  optional ddb
 kern/uipc_domain.c standard
 kern/uipc_mbuf.c   standard
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 20:56, Jim Harris wrote:

On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd  wrote:

On 24 October 2012 11:36, Jim Harris  wrote:


   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.


Ok, but..



 struct mtx  tdq_lock;   /* run queue lock. */
+   charpad[64 - sizeof(struct mtx)];


.. don't we have an existing compile time macro for the cache line
size, which can be used here?


Yes, but I didn't use it for a couple of reasons:

1) struct tdq itself is currently using __aligned(64), so I wanted to
keep it consistent.
2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
NetBurst-based processors having 128-byte cache sectors a while back.
I had planned to start a separate thread on arch@ about this today on
whether this was still appropriate.


See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
the future possibly change to a different compiler dependent
align attribute
 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
automatically gets aligned in all cases, even when dynamically
allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:49, Jim Harris wrote:

On Wed, Oct 24, 2012 at 12:16 PM, Andre Oppermann  wrote:





See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
 the future possibly change to a different compiler dependent
 align attribute
  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
 automatically gets aligned in all cases, even when dynamically
 allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.


I've run my same tests with #3 as you describe, and I did see further
noticeable improvement.  I had a difficult time though quantifying the
effect it would have on all of the different architectures.  Putting
it in ULE's tdq gained 60-70% of the overall benefit, and was well
contained.


I just experimented with different specifications of alignment
and couldn't get the globals aligned at all.  This seems to be
because of the linker not understanding or not getting passed
the alignment information when linking the kernel.


I agree that sprinkling all over the place isn't pretty.  But focused
investigations into specific locks (spin mutexes, default mutexes,
whatever) may find a few key additional ones that would benefit.  I
started down this path with the sleepq and turnstile locks, but none
of those specifically showed noticeable improvement (at least in the
tests I was running).  There's still some additional ones I want to
look at, but haven't had the time yet.


This runs the very great risk of optimizing for today's available
architectures and then needs rejiggling every five years.  Just as
you've noticed the issue with 128B alignment from the Netburst days.
We never know how the next micro-architecture will behave.  Micro
optimizing each individual invocation of common building blocks is
the wrong path to go.

I'd very much prefer the alignment *and* padding control to be done
in one place for all of them, either through a magic macro or compiler
__attribute__(whatever).

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:06, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris  wrote:

On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin  wrote:

On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote:

Author: jimharris
Date: Wed Oct 24 18:36:41 2012
New Revision: 242014
URL: http://svn.freebsd.org/changeset/base/242014

Log:
   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.

   This enables CPU searches (which read tdq_load) to operate independently
   of any contention on the spinlock.  Some scheduler-intensive workloads
   running on an 8C single-socket SNB Xeon show considerable improvement with
   this change (2-3% perf improvement, 5-6% decrease in CPU util).

   Sponsored by:   Intel
   Reviewed by:jeff

Modified:
   head/sys/kern/sched_ule.c

Modified: head/sys/kern/sched_ule.c


==

--- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013)
+++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014)
@@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1;
   * locking in sched_pickcpu();
   */
  struct tdq {
- /* Ordered to improve efficiency of cpu_search() and switch(). */
+ /*
+  * Ordered to improve efficiency of cpu_search() and switch().
+  * tdq_lock is padded to avoid false sharing with tdq_load and
+  * tdq_cpu_idle.
+  */
   struct mtx  tdq_lock;   /* run queue lock. */
+ charpad[64 - sizeof(struct mtx)];


Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead?



No - that doesn't pad it.  I believe that only works if it's global,
i.e. not part of a data structure.


As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html


Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:30, Alexander Motin wrote:

On 24.10.2012 22:16, Andre Oppermann wrote:

On 24.10.2012 20:56, Jim Harris wrote:

On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd 
wrote:

On 24 October 2012 11:36, Jim Harris  wrote:


   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.


Ok, but..



 struct mtx  tdq_lock;   /* run queue lock. */
+   charpad[64 - sizeof(struct mtx)];


.. don't we have an existing compile time macro for the cache line
size, which can be used here?


Yes, but I didn't use it for a couple of reasons:

1) struct tdq itself is currently using __aligned(64), so I wanted to
keep it consistent.
2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
NetBurst-based processors having 128-byte cache sectors a while back.
I had planned to start a separate thread on arch@ about this today on
whether this was still appropriate.


See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
 the future possibly change to a different compiler dependent
 align attribute
  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
 automatically gets aligned in all cases, even when dynamically
 allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.


I'm sorry, could you hint me with some theory? I think I can agree that cache 
line sharing can be a
problem in case of spin locks -- waiting thread will constantly try to access 
page modified by other
CPU, that I guess will cause cache line writes to the RAM. But why is it so bad 
to share lock with
respective data in case of non-spin locks? Won't benefits from free regular 
prefetch of the right
data while grabbing lock compensate penalties from relatively rare collisions?


Cliff Click describes it in detail:
 http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends

For a classic mutex it likely doesn't make much difference since the
cache line is exclusive anyway while the lock is held.  On LL/SC systems
there may be cache line dirtying on a failed locking attempt.

For spin mutexes it hurts badly as you noted.

Especially on RW mutexes it hurts because a read lock dirties the cache
line for all other CPU's.  Here the RW mutex should be on its own cache
line in all cases.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 22:29, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann  wrote:

On 24.10.2012 21:06, Attilio Rao wrote:

As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html



Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.


Yes but the concept being that if you use __aligned() properly (when
defining a struct) the object will be correctly sized, so you will get
padding automatically.


Yes.  With __aligned() the start of the element/structure should
begin on an address evenly dividable by the align value *and* it
should pad out any remaining space up to the next evenly dividable
address.

The problem we have is that is apparently doesn't work correctly
within gcc when creating structs nor within the linker when placing
such supposedly aligned structs in the .bss section (at least the
padding is missing).

It seems to come down to either a) fixing gcc+ld; or b) hacking
around it by magically padding the structs that require it.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 22:55, Andre Oppermann wrote:

On 24.10.2012 22:29, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann  wrote:

On 24.10.2012 21:06, Attilio Rao wrote:

As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html



Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.


Yes but the concept being that if you use __aligned() properly (when
defining a struct) the object will be correctly sized, so you will get
padding automatically.


Yes.  With __aligned() the start of the element/structure should
begin on an address evenly dividable by the align value *and* it
should pad out any remaining space up to the next evenly dividable
address.

The problem we have is that is apparently doesn't work correctly
within gcc when creating structs nor within the linker when placing
such supposedly aligned structs in the .bss section (at least the
padding is missing).


I spoke too soon.  Attilio is completely right in his assessment.

It does work when done on the struct definition:

struct mtx {
...
} __aligned(CACHE_LINE_SIZE);   /* works including .bss alignment & padding */

When creating a struct (in globals at least) it doesn't work:

struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx;  /* doesn't work */


It seems to come down to either a) fixing gcc+ld; or b) hacking
around it by magically padding the structs that require it.


The question now becomes of whether we can (should?) make the latter
case above work or find another workaround.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-25 Thread Andre Oppermann

On 25.10.2012 11:39, Andrey V. Elsukov wrote:

Author: ae
Date: Thu Oct 25 09:39:14 2012
New Revision: 242079
URL: http://svn.freebsd.org/changeset/base/242079

Log:
   Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
   on the related functionality in the runtime via the sysctl variable
   net.pfil.forward. It is turned off by default.

   Sponsored by:Yandex LLC
   Discussed with:  net@
   MFC after:   2 weeks


I still don't agree with naming the sysctl net.pfil.forward.  This
type of forwarding is a property of IPv4 and IPv6 and thus should
be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
who knows where else in the future.  Forwarding works only for IPv46.

You haven't even replied to my comment on net@.  Please change the
sysctl location and name to its appropriate place.

Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_
FORWARD enabled the sysctl at the same time to keep kernel configs
within 9-stable working.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242014 - head/sys/kern

2012-10-25 Thread Andre Oppermann

On 25.10.2012 05:49, Bruce Evans wrote:

On Wed, 24 Oct 2012, Attilio Rao wrote:


On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann  wrote:

...
Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place


This is wrong because it doesn't give padding.


Unless it is sprinkled in struct declarations.


 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
the future possibly change to a different compiler dependent
align attribute


What is this macro supposed to do? I don't understand that from your
description.


 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
automatically gets aligned in all cases, even when dynamically
allocated.


This works but I think it is overkill for structures including sleep
mutexes which are the vast majority. So I wouldn't certainly be in
favor of such a patch.


This doesn't work either with fully dynamic (auto) allocations.  Stack
alignment is generally broken (limited, and pessimized for both space
and time) in gcc (it works better in clang).  On amd64, it is limited
by the default of -mpreferred-stack-boundary=4.  Since 2**4 is smaller
than the cache line size and stack alignments larger than it are broken
in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally,
16/CACHE_LINE_SIZE of the time.  On i386, we reduce the space/time
pessimizations a little by overriding the default to
-mpreferred-stack-boundary=2.  2**2 is even smaller than the cache
line size.  (The pessimizations are for both space and time, since
time and code space is wasted for the code to keep the stack aligned,
and cache space and thus also time are wasted for padding.  Most
functions don't benefit from more than sizeof(register_t) alignment.)


I'm not aware of stack allocated mutexes anywhere in the kernel.
Even if there is a case it's very special and unique.

I've verified that __aligned(CACHE_LINE_SIZE) on the definition of
struct mtx itself (in sys/_mutex.h) correctly aligns and pads the
global .bss resident mutexes for 64B and 128B cache line sizes.


Dynamic allocations via malloc() get whatever alignment malloc() gives.
This is only required to be 4 or 8 or 16 or so (the maximum for a C
object declared in conforming C (no __align()), but malloc() usually
gives more.  If it gives CACHE_LINE_SIZE, that is wasteful for most
small allocations.


Stand-alone mutexes are normally not malloc'ed.  They're always
embedded into some larger structure they protect.


__builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at
least on i386.  In gcc-3.3.3, it assumes that the stack is the default
16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to
say otherwise, and just subtracts from the stack pointer.  In gcc-4.2.1,
it does the necessary andl of the stack pointer, but only 16-byte
alignment.

It is another bug that there sre no extensions of malloc() or alloca().
Since malloc() is in the library and may give CACHE_LINE_SIZE but
__builtin_alloca() is in the compiler and only gives 16, these functions
are not even as compatible as they should be.

I don't know of any mutexes allocated on the stack, but there are stack
frames with mcontexts in them that need special alignment so they cause
problems on i386.  They can't just be put on the stack due to the above
bugs. They are laboriously allocated using malloc().  Since they are a
quite large, 1 mcontext barely fits on the kernel stack, so kib didn't
like my alloca() method for allocating them.


You lost me here.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-25 Thread Andre Oppermann

On 25.10.2012 18:25, Andrey V. Elsukov wrote:

On 25.10.2012 19:54, Andre Oppermann wrote:

I still don't agree with naming the sysctl net.pfil.forward.  This
type of forwarding is a property of IPv4 and IPv6 and thus should
be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
who knows where else in the future.  Forwarding works only for IPv46.

You haven't even replied to my comment on net@.  Please change the
sysctl location and name to its appropriate place.


Hi Andre,

There were two replies related to this subject, you did not replied to
them and i thought that you became agree.


I replied to your reply to mine.  Other than that I didn't find
anything else from you.


So, if not, what you think about the name net.pfil.ipforward?


net.inet.ip.pfil_forward
net.inet6.ip6.pfil_forward

or something like that.

If you can show with your performance profiling that the sysctl
isn't even necessary, you could leave it completely away and have
pfil_forward enabled permanently.  That would be even better for
everybody.


Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_
FORWARD enabled the sysctl at the same time to keep kernel configs
within 9-stable working.


Yes, it will work like that.


Excellent.  Thank you.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 13:26, Gleb Smirnoff wrote:

On Thu, Oct 25, 2012 at 10:29:51PM +0200, Andre Oppermann wrote:
A> On 25.10.2012 18:25, Andrey V. Elsukov wrote:
A> > On 25.10.2012 19:54, Andre Oppermann wrote:
A> >> I still don't agree with naming the sysctl net.pfil.forward.  This
A> >> type of forwarding is a property of IPv4 and IPv6 and thus should
A> >> be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
A> >> who knows where else in the future.  Forwarding works only for IPv46.
A> >>
A> >> You haven't even replied to my comment on net@.  Please change the
A> >> sysctl location and name to its appropriate place.
A> >
A> > Hi Andre,
A> >
A> > There were two replies related to this subject, you did not replied to
A> > them and i thought that you became agree.
A>
A> I replied to your reply to mine.  Other than that I didn't find
A> anything else from you.
A>
A> > So, if not, what you think about the name net.pfil.ipforward?
A>
A> net.inet.ip.pfil_forward
A> net.inet6.ip6.pfil_forward
A>
A> or something like that.
A>
A> If you can show with your performance profiling that the sysctl
A> isn't even necessary, you could leave it completely away and have
A> pfil_forward enabled permanently.  That would be even better for
A> everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 14:29, Andrey V. Elsukov wrote:

On 26.10.2012 15:43, Andre Oppermann wrote:

A> If you can show with your performance profiling that the sysctl
A> isn't even necessary, you could leave it completely away and have
A> pfil_forward enabled permanently.  That would be even better for
A> everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.


It seems we have only one bit in the m_flags that can be used, so, maybe
we left it to some things that can appear in the future?


That's what the M_PROTO flags are for:

#define M_IPFW_FORWARD  M_PROTO2/* ip forwarding */

of course you have to do the same for ip6.

The M_PROTO[1-5] flags are only valid within a protocol layer.  For
example they get cleared in ip_output() before the packet is handed
to layer 2.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 15:24, Andre Oppermann wrote:

On 26.10.2012 14:29, Andrey V. Elsukov wrote:

On 26.10.2012 15:43, Andre Oppermann wrote:

A> If you can show with your performance profiling that the sysctl
A> isn't even necessary, you could leave it completely away and have
A> pfil_forward enabled permanently.  That would be even better for
A> everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.


It seems we have only one bit in the m_flags that can be used, so, maybe
we left it to some things that can appear in the future?


That's what the M_PROTO flags are for:

#defineM_IPFW_FORWARDM_PROTO2/* ip forwarding */


Actually looking at it technically this isn't forwarding but specifying
a different nexthop.  Hence the #define and description should be more
like

#define M_IP_NEXTHOPM_PROTO2/* explicit ip nexthop */

Of course the userspace ipfw feature naming and usage doesn't change.
But within the kernel it's really nexthop manipulation within the
forwarding path.

--
Andre


of course you have to do the same for ip6.

The M_PROTO[1-5] flags are only valid within a protocol layer.  For
example they get cleared in ip_output() before the packet is handed
to layer 2.



___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242151 - in head/sys: vm xen/evtchn

2012-10-26 Thread Andre Oppermann
Author: andre
Date: Fri Oct 26 17:31:35 2012
New Revision: 242151
URL: http://svn.freebsd.org/changeset/base/242151

Log:
  Move the corresponding MTX_SYSINIT() next to their struct mtx declaration
  to make their relationship more obvious as done with the other such mutexs.

Modified:
  head/sys/vm/vm_glue.c
  head/sys/xen/evtchn/evtchn.c

Modified: head/sys/vm/vm_glue.c
==
--- head/sys/vm/vm_glue.c   Fri Oct 26 17:02:50 2012(r242150)
+++ head/sys/vm/vm_glue.c   Fri Oct 26 17:31:35 2012(r242151)
@@ -307,6 +307,8 @@ struct kstack_cache_entry *kstack_cache;
 static int kstack_cache_size = 128;
 static int kstacks;
 static struct mtx kstack_cache_mtx;
+MTX_SYSINIT(kstack_cache, &kstack_cache_mtx, "kstkch", MTX_DEF);
+
 SYSCTL_INT(_vm, OID_AUTO, kstack_cache_size, CTLFLAG_RW, &kstack_cache_size, 0,
 "");
 SYSCTL_INT(_vm, OID_AUTO, kstacks, CTLFLAG_RD, &kstacks, 0,
@@ -486,7 +488,6 @@ kstack_cache_init(void *nulll)
EVENTHANDLER_PRI_ANY);
 }
 
-MTX_SYSINIT(kstack_cache, &kstack_cache_mtx, "kstkch", MTX_DEF);
 SYSINIT(vm_kstacks, SI_SUB_KTHREAD_INIT, SI_ORDER_ANY, kstack_cache_init, 
NULL);
 
 #ifndef NO_SWAPPING

Modified: head/sys/xen/evtchn/evtchn.c
==
--- head/sys/xen/evtchn/evtchn.cFri Oct 26 17:02:50 2012
(r242150)
+++ head/sys/xen/evtchn/evtchn.cFri Oct 26 17:31:35 2012
(r242151)
@@ -44,7 +44,15 @@ static inline unsigned long __ffs(unsign
 return word;
 }
 
+/*
+ * irq_mapping_update_lock: in order to allow an interrupt to occur in a 
critical
+ * section, to set pcpu->ipending (etc...) properly, we
+ * must be able to get the icu lock, so it can't be
+ * under witness.
+ */
 static struct mtx irq_mapping_update_lock;
+MTX_SYSINIT(irq_mapping_update_lock, &irq_mapping_update_lock, "xp", MTX_SPIN);
+
 static struct xenpic *xp;
 struct xenpic_intsrc {
struct intsrc xp_intsrc;
@@ -1130,11 +1138,4 @@ evtchn_init(void *dummy __unused)
 }
 
 SYSINIT(evtchn_init, SI_SUB_INTR, SI_ORDER_MIDDLE, evtchn_init, NULL);
-/*
- * irq_mapping_update_lock: in order to allow an interrupt to occur in a 
critical
- * section, to set pcpu->ipending (etc...) properly, we
- * must be able to get the icu lock, so it can't be
- * under witness.
- */
 
-MTX_SYSINIT(irq_mapping_update_lock, &irq_mapping_update_lock, "xp", MTX_SPIN);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf

2012-10-27 Thread Andre Oppermann

On 26.10.2012 23:06, Gleb Smirnoff wrote:

Author: glebius
Date: Fri Oct 26 21:06:33 2012
New Revision: 242161
URL: http://svn.freebsd.org/changeset/base/242161

Log:
   o Remove last argument to ip_fragment(), and obtain all needed information
 on checksums directly from mbuf flags. This simplifies code.
   o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
 hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
 although try to do checksums if CSUM_IP set on mbuf. Example is em(4).


I'm not getting your description here?  Why work around a bug in a driver
in ip_fragment() when we can fix the bug in the driver?


   o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
 After this change CSUM_DELAY_IP vanishes from the stack.


Good. :)


   Submitted by:Sebastian Kuzminsky 


--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242249 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:16:09 2012
New Revision: 242249
URL: http://svn.freebsd.org/changeset/base/242249

Log:
  Adjust the initial default CWND upon connection establishment to the
  new and increased values specified by RFC5681 Section 3.1.
  
  The even larger initial CWND per RFC3390, if enabled, is not affected.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 17:06:50 2012
(r242248)
+++ head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012
(r242249)
@@ -351,8 +351,15 @@ cc_conn_init(struct tcpcb *tp)
if (V_tcp_do_rfc3390)
tp->snd_cwnd = min(4 * tp->t_maxseg,
max(2 * tp->t_maxseg, 4380));
-   else
-   tp->snd_cwnd = tp->t_maxseg;
+   else {
+   /* Per RFC5681 Section 3.1 */
+   if (tp->t_maxseg > 2190)
+   tp->snd_cwnd = 2 * tp->t_maxseg;
+   else if (tp->t_maxseg > 1095)
+   tp->snd_cwnd = 3 * tp->t_maxseg;
+   else
+   tp->snd_cwnd = 4 * tp->t_maxseg;
+   }
 
if (CC_ALGO(tp)->conn_init != NULL)
CC_ALGO(tp)->conn_init(tp->ccv);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242250 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:25:08 2012
New Revision: 242250
URL: http://svn.freebsd.org/changeset/base/242250

Log:
  When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
  reduce the initial CWND to one segment.  This reduction got lost
  some time ago due to a change in initialization ordering.
  
  Additionally in tcp_timer_rexmt() avoid entering fast recovery when
  we're still in TCPS_SYN_SENT state.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_input.cSun Oct 28 17:25:08 2012
(r242250)
@@ -345,10 +345,16 @@ cc_conn_init(struct tcpcb *tp)
/*
 * Set the initial slow-start flight size.
 *
-* RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-* XXX: We currently check only in syncache_socket for that.
-*/
-   if (V_tcp_do_rfc3390)
+* RFC5681 Section 3.1 specifies the default conservative values.
+* RFC3390 specifies slightly more aggressive values.
+*
+* If a SYN or SYN/ACK was lost and retransmitted, we have to
+* reduce the initial CWND to one segment as congestion is likely
+* requiring us to be cautious.
+*/
+   if (tp->snd_cwnd == 1)
+   tp->snd_cwnd = tp->t_maxseg;/* SYN(-ACK) lost */
+   else if (V_tcp_do_rfc3390)
tp->snd_cwnd = min(4 * tp->t_maxseg,
max(2 * tp->t_maxseg, 4380));
else {

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_syncache.c Sun Oct 28 17:25:08 2012
(r242250)
@@ -852,11 +852,12 @@ syncache_socket(struct syncache *sc, str
tcp_mss(tp, sc->sc_peer_mss);
 
/*
-* If the SYN,ACK was retransmitted, reset cwnd to 1 segment.
+* If the SYN,ACK was retransmitted, indicate that CWND to be
+* limited to one segment in cc_conn_init().
 * NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits.
 */
if (sc->sc_rxmits > 1)
-   tp->snd_cwnd = tp->t_maxseg;
+   tp->snd_cwnd = 1;
 
 #ifdef TCP_OFFLOAD
/*

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 17:25:08 2012
(r242250)
@@ -539,7 +539,13 @@ tcp_timer_rexmt(void * xtp)
}
INP_INFO_RUNLOCK(&V_tcbinfo);
headlocked = 0;
-   if (tp->t_rxtshift == 1) {
+   if (tp->t_state == TCPS_SYN_SENT) {
+   /*
+* If the SYN was retransmitted, indicate CWND to be
+* limited to 1 segment in cc_conn_init().
+*/
+   tp->snd_cwnd = 1;
+   } else if (tp->t_rxtshift == 1) {
/*
 * first retransmit; record ssthresh and cwnd so they can
 * be recovered if this turns out to be a "bad" retransmit.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242251 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:30:28 2012
New Revision: 242251
URL: http://svn.freebsd.org/changeset/base/242251

Log:
  When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
  reduce the initial CWND to one segment.  This reduction got lost
  some time ago due to a change in initialization ordering.
  
  Additionally in tcp_timer_rexmt() avoid entering fast recovery when
  we're still in TCPS_SYN_SENT state.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 28 17:25:08 2012
(r242250)
+++ head/sys/netinet/tcp_output.c   Sun Oct 28 17:30:28 2012
(r242251)
@@ -551,10 +551,14 @@ after_sack_rexmit:
 * max size segments, or at least 50% of the maximum possible
 * window, then want to send a window update to peer.
 * Skip this if the connection is in T/TCP half-open state.
-* Don't send pure window updates when the peer has closed
-* the connection and won't ever send more data.
+*
+* Don't send an independent window update if a delayed
+* ACK is pending (it will get piggy-backed on it) or the
+* remote side already has done a half-close and won't send
+* more data.
 */
if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
+   !(tp->t_flags & TF_DELACK) &&
!TCPS_HAVERCVDFIN(tp->t_state)) {
/*
 * "adv" is the amount we can increase the window,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242252 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:40:35 2012
New Revision: 242252
URL: http://svn.freebsd.org/changeset/base/242252

Log:
  Prevent a flurry of forced window updates when an application is
  doing small reads on a (partially) filled receive socket buffer.
  
  Normally one would a send a window update every time the available
  space in the socket buffer increases by two times MSS.  This leads
  to a flurry of window updates that do not provide any meaningful
  new information to the sender.  There still is available space in
  the window and the sender can continue sending data.  All window
  updates then get carried by the regular ACKs.  Only when the socket
  buffer was (almost) full and the window closed accordingly a window
  updates delivery new information and allows the sender to start
  sending more data again.
  
  Send window updates only every two MSS when the socket buffer
  has less than 1/8 space available, or the available space in the
  socket buffer increased by 1/4 its full capacity, or the socket
  buffer is very small.  The next regular data ACK will carry and
  report the exact window size again.
  
  Reported by:  sbruno
  Tested by:darrenr
  Tested by:Darren Baginski
  PR:   kern/116335
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 28 17:30:28 2012
(r242251)
+++ head/sys/netinet/tcp_output.c   Sun Oct 28 17:40:35 2012
(r242252)
@@ -545,23 +545,39 @@ after_sack_rexmit:
}
 
/*
-* Compare available window to amount of window
-* known to peer (as advertised window less
-* next expected input).  If the difference is at least two
-* max size segments, or at least 50% of the maximum possible
-* window, then want to send a window update to peer.
-* Skip this if the connection is in T/TCP half-open state.
+* Sending of standalone window updates.
+*
+* Window updates important when we close our window due to a full
+* socket buffer and are opening it again after the application
+* reads data from it.  Once the window has opened again and the
+* remote end starts to send again the ACK clock takes over and
+* provides the most current window information.
+*
+* We must avoid to the silly window syndrome whereas every read
+* from the receive buffer, no matter how small, causes a window
+* update to be sent.  We also should avoid sending a flurry of
+* window updates when the socket buffer had queued a lot of data
+* and the application is doing small reads.
+*
+* Prevent a flurry of pointless window updates by only sending
+* an update when we can increase the advertized window by more
+* than 1/4th of the socket buffer capacity.  When the buffer is
+* getting full or is very small be more aggressive and send an
+* update whenever we can increase by two mss sized segments.
+* In all other situations the ACK's to new incoming data will
+* carry further window increases.
 *
 * Don't send an independent window update if a delayed
 * ACK is pending (it will get piggy-backed on it) or the
 * remote side already has done a half-close and won't send
-* more data.
+* more data.  Skip this if the connection is in T/TCP
+* half-open state.
 */
if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
!(tp->t_flags & TF_DELACK) &&
!TCPS_HAVERCVDFIN(tp->t_state)) {
/*
-* "adv" is the amount we can increase the window,
+* "adv" is the amount we could increase the window,
 * taking into account that we are limited by
 * TCP_MAXWIN << tp->rcv_scale.
 */
@@ -581,9 +597,11 @@ after_sack_rexmit:
 */
if (oldwin >> tp->rcv_scale == (adv + oldwin) >> tp->rcv_scale)
goto dontupdate;
-   if (adv >= (long) (2 * tp->t_maxseg))
-   goto send;
-   if (2 * adv >= (long) so->so_rcv.sb_hiwat)
+
+   if (adv >= (long)(2 * tp->t_maxseg) &&
+   (adv >= (long)(so->so_rcv.sb_hiwat / 4) ||
+recwin <= (long)(so->so_rcv.sb_hiwat / 8) ||
+so->so_rcv.sb_hiwat <= 8 * tp->t_maxseg))
goto send;
}
 dontupdate:
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242253 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:59:46 2012
New Revision: 242253
URL: http://svn.freebsd.org/changeset/base/242253

Log:
  Simplify implementation of net.inet.tcp.reass.maxsegments and
  net.inet.tcp.reass.cursegments.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==
--- head/sys/netinet/tcp_reass.cSun Oct 28 17:40:35 2012
(r242252)
+++ head/sys/netinet/tcp_reass.cSun Oct 28 17:59:46 2012
(r242253)
@@ -74,7 +74,6 @@ __FBSDID("$FreeBSD$");
 #include 
 #endif /* TCPDEBUG */
 
-static int tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS);
 static int tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS);
 
 static SYSCTL_NODE(_net_inet_tcp, OID_AUTO, reass, CTLFLAG_RW, 0,
@@ -82,16 +81,12 @@ static SYSCTL_NODE(_net_inet_tcp, OID_AU
 
 static VNET_DEFINE(int, tcp_reass_maxseg) = 0;
 #defineV_tcp_reass_maxseg  VNET(tcp_reass_maxseg)
-SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, maxsegments,
-CTLTYPE_INT | CTLFLAG_RDTUN,
-&VNET_NAME(tcp_reass_maxseg), 0, &tcp_reass_sysctl_maxseg, "I",
+SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxsegments, CTLFLAG_RDTUN,
+&VNET_NAME(tcp_reass_maxseg), 0,
 "Global maximum number of TCP Segments in Reassembly Queue");
 
-static VNET_DEFINE(int, tcp_reass_qsize) = 0;
-#defineV_tcp_reass_qsize   VNET(tcp_reass_qsize)
 SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, cursegments,
-CTLTYPE_INT | CTLFLAG_RD,
-&VNET_NAME(tcp_reass_qsize), 0, &tcp_reass_sysctl_qsize, "I",
+(CTLTYPE_INT | CTLFLAG_RD), NULL, 0, &tcp_reass_sysctl_qsize, "I",
 "Global number of TCP Segments currently in Reassembly Queue");
 
 static VNET_DEFINE(int, tcp_reass_overflows) = 0;
@@ -109,8 +104,10 @@ static void
 tcp_reass_zone_change(void *tag)
 {
 
+   /* Set the zone limit and read back the effective value. */
V_tcp_reass_maxseg = nmbclusters / 16;
uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg);
+   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
 }
 
 void
@@ -122,7 +119,9 @@ tcp_reass_init(void)
&V_tcp_reass_maxseg);
V_tcp_reass_zone = uma_zcreate("tcpreass", sizeof (struct tseg_qent),
NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
+   /* Set the zone limit and read back the effective value. */
uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg);
+   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
EVENTHANDLER_REGISTER(nmbclusters_change,
tcp_reass_zone_change, NULL, EVENTHANDLER_PRI_ANY);
 }
@@ -156,17 +155,12 @@ tcp_reass_flush(struct tcpcb *tp)
 }
 
 static int
-tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS)
-{
-   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
-   return (sysctl_handle_int(oidp, arg1, arg2, req));
-}
-
-static int
 tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS)
 {
-   V_tcp_reass_qsize = uma_zone_get_cur(V_tcp_reass_zone);
-   return (sysctl_handle_int(oidp, arg1, arg2, req));
+   int qsize;
+
+   qsize = uma_zone_get_cur(V_tcp_reass_zone);
+   return (sysctl_handle_int(oidp, &qsize, sizeof(qsize), req));
 }
 
 int
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242254 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:07:34 2012
New Revision: 242254
URL: http://svn.freebsd.org/changeset/base/242254

Log:
  Change the syncache count reporting the current number of entries
  from an unprotected u_int that reports garbage on SMP to a function
  based sysctl obtaining the current value from UMA.
  
  Also read back the actual cache_limit after page size rounding by UMA.
  
  PR:   kern/165879
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_syncache.h

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:59:46 2012
(r242253)
+++ head/sys/netinet/tcp_syncache.c Sun Oct 28 18:07:34 2012
(r242254)
@@ -123,6 +123,7 @@ struct syncache *syncache_lookup(struct 
 static int  syncache_respond(struct syncache *);
 static struct   socket *syncache_socket(struct syncache *, struct socket *,
struct mbuf *m);
+static int  syncache_sysctl_count(SYSCTL_HANDLER_ARGS);
 static void syncache_timeout(struct syncache *sc, struct syncache_head 
*sch,
int docallout);
 static void syncache_timer(void *);
@@ -158,8 +159,8 @@ SYSCTL_VNET_UINT(_net_inet_tcp_syncache,
 &VNET_NAME(tcp_syncache.cache_limit), 0,
 "Overall entry limit for syncache");
 
-SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, count, CTLFLAG_RD,
-&VNET_NAME(tcp_syncache.cache_count), 0,
+SYSCTL_VNET_PROC(_net_inet_tcp_syncache, OID_AUTO, count, 
(CTLTYPE_UINT|CTLFLAG_RD),
+NULL, 0, &syncache_sysctl_count, "IU",
 "Current number of entries in syncache");
 
 SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, hashsize, CTLFLAG_RDTUN,
@@ -225,7 +226,6 @@ syncache_init(void)
 {
int i;
 
-   V_tcp_syncache.cache_count = 0;
V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT;
V_tcp_syncache.rexmt_limit = SYNCACHE_MAXREXMTS;
@@ -269,6 +269,7 @@ syncache_init(void)
V_tcp_syncache.zone = uma_zcreate("syncache", sizeof(struct syncache),
NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
uma_zone_set_max(V_tcp_syncache.zone, V_tcp_syncache.cache_limit);
+   V_tcp_syncache.cache_limit = uma_zone_get_max(V_tcp_syncache.zone);
 }
 
 #ifdef VIMAGE
@@ -296,8 +297,8 @@ syncache_destroy(void)
mtx_destroy(&sch->sch_mtx);
}
 
-   KASSERT(V_tcp_syncache.cache_count == 0, ("%s: cache_count %d not 0",
-   __func__, V_tcp_syncache.cache_count));
+   KASSERT(uma_zone_get_cur(V_tcp_syncache.zone) == 0,
+   ("%s: cache_count not 0", __func__));
 
/* Free the allocated global resources. */
uma_zdestroy(V_tcp_syncache.zone);
@@ -305,6 +306,15 @@ syncache_destroy(void)
 }
 #endif
 
+static int
+syncache_sysctl_count(SYSCTL_HANDLER_ARGS)
+{
+   int count;
+
+   count = uma_zone_get_cur(V_tcp_syncache.zone);
+   return (sysctl_handle_int(oidp, &count, sizeof(count), req));
+}
+
 /*
  * Inserts a syncache entry into the specified bucket row.
  * Locks and unlocks the syncache_head autonomously.
@@ -347,7 +357,6 @@ syncache_insert(struct syncache *sc, str
 
SCH_UNLOCK(sch);
 
-   V_tcp_syncache.cache_count++;
TCPSTAT_INC(tcps_sc_added);
 }
 
@@ -373,7 +382,6 @@ syncache_drop(struct syncache *sc, struc
 #endif
 
syncache_free(sc);
-   V_tcp_syncache.cache_count--;
 }
 
 /*
@@ -958,7 +966,6 @@ syncache_expand(struct in_conninfo *inc,
tod->tod_syncache_removed(tod, sc->sc_todctx);
}
 #endif
-   V_tcp_syncache.cache_count--;
SCH_UNLOCK(sch);
}
 

Modified: head/sys/netinet/tcp_syncache.h
==
--- head/sys/netinet/tcp_syncache.h Sun Oct 28 17:59:46 2012
(r242253)
+++ head/sys/netinet/tcp_syncache.h Sun Oct 28 18:07:34 2012
(r242254)
@@ -112,7 +112,6 @@ struct tcp_syncache {
u_int   hashsize;
u_int   hashmask;
u_int   bucket_limit;
-   u_int   cache_count;/* XXX: unprotected */
u_int   cache_limit;
u_int   rexmt_limit;
u_int   hash_secret;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242255 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:33:52 2012
New Revision: 242255
URL: http://svn.freebsd.org/changeset/base/242255

Log:
  Allow arbitrary MSS sizes and don't mind about the cluster size anymore.
  We've got more cluster sizes for quite some time now and the orginally
  imposed limits and the previously codified thoughts on efficiency gains
  are no longer true.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 18:07:34 2012
(r242254)
+++ head/sys/netinet/tcp_input.cSun Oct 28 18:33:52 2012
(r242255)
@@ -3322,10 +3322,8 @@ tcp_xmit_timer(struct tcpcb *tp, int rtt
 /*
  * Determine a reasonable value for maxseg size.
  * If the route is known, check route for mtu.
- * If none, use an mss that can be handled on the outgoing
- * interface without forcing IP to fragment; if bigger than
- * an mbuf cluster (MCLBYTES), round down to nearest multiple of MCLBYTES
- * to utilize large mbufs.  If no route is found, route has no mtu,
+ * If none, use an mss that can be handled on the outgoing interface
+ * without forcing IP to fragment.  If no route is found, route has no mtu,
  * or the destination isn't local, use a default, hopefully conservative
  * size (usually 512 or the default IP max size, but no more than the mtu
  * of the interface), as we can't discover anything about intervening
@@ -3506,13 +3504,6 @@ tcp_mss_update(struct tcpcb *tp, int off
 (tp->t_flags & TF_RCVD_TSTMP) == TF_RCVD_TSTMP))
mss -= TCPOLEN_TSTAMP_APPA;
 
-#if(MCLBYTES & (MCLBYTES - 1)) == 0
-   if (mss > MCLBYTES)
-   mss &= ~(MCLBYTES-1);
-#else
-   if (mss > MCLBYTES)
-   mss = mss / MCLBYTES * MCLBYTES;
-#endif
tp->t_maxseg = mss;
 }
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r242256 - head/sys/kern

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:38:51 2012
New Revision: 242256
URL: http://svn.freebsd.org/changeset/base/242256

Log:
  Improve m_cat() by being able to also merge contents from M_EXT
  mbuf's by doing proper testing with M_WRITABLE().
  
  In m_collapse() replace an incomplete manual check for M_RDONLY
  with the M_WRITABLE() macro that also tests for shared buffers
  and other cases that make a particular mbuf immutable.
  
  MFC after:2 weeks

Modified:
  head/sys/kern/uipc_mbuf.c

Modified: head/sys/kern/uipc_mbuf.c
==
--- head/sys/kern/uipc_mbuf.c   Sun Oct 28 18:33:52 2012(r242255)
+++ head/sys/kern/uipc_mbuf.c   Sun Oct 28 18:38:51 2012(r242256)
@@ -911,8 +911,8 @@ m_cat(struct mbuf *m, struct mbuf *n)
while (m->m_next)
m = m->m_next;
while (n) {
-   if (m->m_flags & M_EXT ||
-   m->m_data + m->m_len + n->m_len >= &m->m_dat[MLEN]) {
+   if (!M_WRITABLE(m) ||
+   M_TRAILINGSPACE(m) < n->m_len) {
/* just join the two chains */
m->m_next = n;
return;
@@ -1584,7 +1584,7 @@ again:
n = m->m_next;
if (n == NULL)
break;
-   if ((m->m_flags & M_RDONLY) == 0 &&
+   if (M_WRITABLE(m) &&
n->m_len < M_TRAILINGSPACE(m)) {
bcopy(mtod(n, void *), mtod(m, char *) + m->m_len,
n->m_len);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


  1   2   3   >