Re: Much improved sendfile(2) kernel implementation

2006-09-23 Thread David Malone
On Fri, Sep 22, 2006 at 11:48:23PM +0100, Robert Watson wrote:
> The impact of TSO is clearly dramatic, especially when combined with the 
> patch, but I'm a bit concerned by the drop in performance in the patched 
> non-TSO case.  For network cards which will always have TSO enabled, this 
> isn't an issue, but do we see a similar affect for drivers without TSO?  
> What can we put this drop down to?

We probably also need to make sure that any performance increase
in TSO isn't due to us getting TCP congestion control wrong. I think
in Linux they had problems when they first introduced TSO because
TCP was advancing the congestion window by a TSO-sized chunk instead
of a wire packet. OTOH, I think Andre and Drew's tests are low-latency,
so congestion control isn't likely to be playing a big role, so the
improvements are unlikely to be due to this.

David.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Much improved sendfile(2) kernel implementation

2006-09-23 Thread Andre Oppermann

Robert Watson wrote:


On Sat, 23 Sep 2006, Andre Oppermann wrote:


Without patch:
 87380 393216 39321610.00  2163.08   100.00   19.353.787 
1.466 Without patch + TSO:
 87380 393216 39321610.00  4367.18   71.5442.071.342 
1.578 With patch:
 87380 393216 39321610.01  1882.73   86.1518.433.749 
1.604 With patch + TSO:
 87380 393216 39321610.00  6961.08   47.6960.110.561 
1.415


The impact of TSO is clearly dramatic, especially when combined with the 
patch, but I'm a bit concerned by the drop in performance in the patched 
non-TSO case.  For network cards which will always have TSO enabled, 
this isn't an issue, but do we see a similar affect for drivers without 
TSO?  What can we put this drop down to?


If you look at my GigE numbers there is no drop for the new-sendfile w/o
TSO case.  In this 10Gig test the drop is really and artifact of how the
whole setup and the way netperf makes use of the sendfile call.  Internally
new-sendfile waits until 50% of the socket buffer are free to be bulk
filled again.  This value can be modified by setting a low watermark on
the send socket buffer.  Netperf does buffer sized sendfile invocations
and this is very timing critical with 10G.  Which gives this picture:
call sendfile(380K) -> fill socket buffer -> wait -> fill rest -> return ->
call sendfile(380K) ...  Not to mention all the additional work tcp_output()
has to do w/o TSO.  Especially with large buffers it has to loop over the
mbuf chain for each packet to find out where to start copying.  And besides
there is no point in having a non-TSO capable interface at above 1-2Gbit.
Not even Linux can keep up there.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Much improved sendfile(2) kernel implementation

2006-09-23 Thread Andre Oppermann

David Malone wrote:

On Fri, Sep 22, 2006 at 11:48:23PM +0100, Robert Watson wrote:
The impact of TSO is clearly dramatic, especially when combined with the 
patch, but I'm a bit concerned by the drop in performance in the patched 
non-TSO case.  For network cards which will always have TSO enabled, this 
isn't an issue, but do we see a similar affect for drivers without TSO?  
What can we put this drop down to?


We probably also need to make sure that any performance increase
in TSO isn't due to us getting TCP congestion control wrong. I think
in Linux they had problems when they first introduced TSO because
TCP was advancing the congestion window by a TSO-sized chunk instead
of a wire packet. OTOH, I think Andre and Drew's tests are low-latency,
so congestion control isn't likely to be playing a big role, so the
improvements are unlikely to be due to this.


The congestion window is increased based on the ACK's received.  TSO
is only done on the send side and only up to the current congestion
window.  I have been careful not to get any changes in congestion
control behavior with TSO.  (Which does not mean that there may be
other bugs lurking in our congestion control.)

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Much improved sendfile(2) kernel implementation

2006-09-23 Thread David Malone
> The congestion window is increased based on the ACK's received.  TSO
> is only done on the send side and only up to the current congestion
> window.  I have been careful not to get any changes in congestion
> control behavior with TSO.  (Which does not mean that there may be
> other bugs lurking in our congestion control.)

I think the reason this happened in Linux was because thw congestion
window is counted in segments, which were now TSO sized. You'd send
1 TSO sized segment, get back (say) 10 ACKs because of segmentation
and increase the window size by 10*TSO_SEG_SIZE/cwnd insead of
10*REAL_MSS/cwnd. We're unlikely to have exactly the same bug,
because we count cwnd in bytes, but it doesn't rule out haveing
other unexpected/subtle interactions (like higher varience of RTT
esitmation - I guess all packets in a TSO segment are now sent with
the same timestamp?).

David.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bin/41647: ifconfig(8) doesn't accept lladdr along with inet address family

2006-09-23 Thread Bruce M Simpson
Synopsis: ifconfig(8) doesn't accept lladdr along with inet address family

State-Changed-From-To: analyzed->suspended
State-Changed-By: bms
State-Changed-When: Sat Sep 23 15:00:03 UTC 2006
State-Changed-Why: 
Not a serious problem. These limitations can be worked around e.g. by
using /etc/start_if. scripts to set the ethernet addresses.


Responsible-Changed-From-To: bms->freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Sat Sep 23 15:00:03 UTC 2006
Responsible-Changed-Why: 


http://www.freebsd.org/cgi/query-pr.cgi?pr=41647
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/56233: IPsec tunnel (ESP) over IPv6: MTU computation is wrong

2006-09-23 Thread Bruce M Simpson
Synopsis: IPsec tunnel (ESP) over IPv6: MTU computation is wrong

Responsible-Changed-From-To: bms->freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Sat Sep 23 16:28:40 UTC 2006
Responsible-Changed-Why: 
I must focus on more specific areas.

http://www.freebsd.org/cgi/query-pr.cgi?pr=56233
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/65616: IPSEC can't detunnel GRE packets after real ESP encryption

2006-09-23 Thread Bruce M Simpson
Synopsis: IPSEC can't detunnel GRE packets after real ESP encryption

Responsible-Changed-From-To: bms->freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Sat Sep 23 16:29:17 UTC 2006
Responsible-Changed-Why: 
I must focus on more specific areas.

http://www.freebsd.org/cgi/query-pr.cgi?pr=65616
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/38554: changing interface ipaddress doesn't seem to work

2006-09-23 Thread Bruce M Simpson
Synopsis: changing interface ipaddress doesn't seem to work

Responsible-Changed-From-To: bms->freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Sat Sep 23 17:36:57 UTC 2006
Responsible-Changed-Why: 
Back to the world for you, but not after actually doing some work on it...

http://www.freebsd.org/cgi/query-pr.cgi?pr=38554
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/39937: ipstealth issue

2006-09-23 Thread Bruce M Simpson
Synopsis: ipstealth issue

State-Changed-From-To: analyzed->suspended
State-Changed-By: bms
State-Changed-When: Sat Sep 23 17:38:49 UTC 2006
State-Changed-Why: 
Back to the free pool for you.


Responsible-Changed-From-To: bms->freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Sat Sep 23 17:38:49 UTC 2006
Responsible-Changed-Why: 
Back to the free pool for you.

http://www.freebsd.org/cgi/query-pr.cgi?pr=39937
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/38554: changing interface ipaddress doesn't seem to work

2006-09-23 Thread Bruce M Simpson
The following reply was made to PR kern/38554; it has been noted by GNATS.

From: Bruce M Simpson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc:  
Subject: Re: kern/38554: changing interface ipaddress doesn't seem to work
Date: Sat, 23 Sep 2006 18:35:50 +0100

 This is a multi-part message in MIME format.
 --03070707080504040106
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Before I suspend my work on this PR, here's a diff I pulled from trying 
 to port the changes to today's CURRENT.
 The patch doesn't work but haven't tested exhaustively. Need to focus on 
 other things.
 
 --03070707080504040106
 Content-Type: text/x-patch;
  name="archie-locia-20060923.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="archie-locia-20060923.diff"
 
  //depot/user/bms/nethead/sys/netinet/in.c#1 - 
/home/bms/fp4/nethead/sys/netinet/in.c 
 --- /tmp/tmp.23928.0   Sat Sep 23 18:32:59 2006
 +++ /home/bms/fp4/nethead/sys/netinet/in.c Sat Sep 23 17:37:13 2006
 @@ -459,6 +459,11 @@
 * a routing process they will come back.
 */
in_ifadown(&ia->ia_ifa, 1);
 +  /*
 +   * Mark the interface address as no longer valid.
 +   * Sockets that are bound to it should notice.
 +   */
 +  ia->ia_ifa.ifa_flags |= RTF_REJECT;
EVENTHANDLER_INVOKE(ifaddr_event, ifp);
error = 0;
break;
  //depot/user/bms/nethead/sys/netinet/in_pcb.c#1 - 
/home/bms/fp4/nethead/sys/netinet/in_pcb.c 
 --- /tmp/tmp.23928.1   Sat Sep 23 18:32:59 2006
 +++ /home/bms/fp4/nethead/sys/netinet/in_pcb.c Sat Sep 23 18:02:08 2006
 @@ -238,14 +238,17 @@
anonport = inp->inp_lport == 0 && (nam == NULL ||
((struct sockaddr_in *)nam)->sin_port == 0);
error = in_pcbbind_setup(inp, nam, &inp->inp_laddr.s_addr,
 -  &inp->inp_lport, cred);
 +  &inp->inp_lport, &inp->inp_locia, cred);
if (error)
return (error);
if (in_pcbinshash(inp) != 0) {
inp->inp_laddr.s_addr = INADDR_ANY;
inp->inp_lport = 0;
 +  inp->inp_locia = NULL;
return (EAGAIN);
}
 +  if (inp->inp_locia != NULL)
 +  IFAREF(&inp->inp_locia->ia_ifa);
if (anonport)
inp->inp_flags |= INP_ANONPORT;
return (0);
 @@ -262,12 +265,13 @@
   */
  int
  in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
 -u_short *lportp, struct ucred *cred)
 +u_short *lportp, struct in_ifaddr **iap, struct ucred *cred)
  {
struct socket *so = inp->inp_socket;
unsigned short *lastport;
struct sockaddr_in *sin;
struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 +  struct in_ifaddr *ia = NULL;
struct in_addr laddr;
u_short lport = 0;
int wild = 0, reuseport = (so->so_options & SO_REUSEPORT);
 @@ -319,7 +323,8 @@
} else if (sin->sin_addr.s_addr != INADDR_ANY) {
sin->sin_port = 0;  /* yech... */
bzero(&sin->sin_zero, sizeof(sin->sin_zero));
 -  if (ifa_ifwithaddr((struct sockaddr *)sin) == 0)
 +  if ((ia = (struct in_ifaddr *)ifa_ifwithaddr(
 +  (struct sockaddr *)sin)) == 0)
return (EADDRNOTAVAIL);
}
laddr = sin->sin_addr;
 @@ -478,6 +483,8 @@
return (EINVAL);
*laddrp = laddr.s_addr;
*lportp = lport;
 +  if (iap != NULL)
 +  *iap = ia;
return (0);
  }
  
 @@ -490,6 +497,7 @@
  int
  in_pcbconnect(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred)
  {
 +  struct in_ifaddr *locia;
u_short lport, fport;
in_addr_t laddr, faddr;
int anonport, error;
 @@ -501,7 +509,7 @@
laddr = inp->inp_laddr.s_addr;
anonport = (lport == 0);
error = in_pcbconnect_setup(inp, nam, &laddr, &lport, &faddr, &fport,
 -  NULL, cred);
 +  NULL, &locia, cred);
if (error)
return (error);
  
 @@ -519,6 +527,9 @@
/* Commit the remaining changes. */
inp->inp_lport = lport;
inp->inp_laddr.s_addr = laddr;
 +  inp->inp_locia = locia;
 +  if (inp->inp_locia != NULL)
 +  IFAREF(&inp->inp_locia->ia_ifa);
inp->inp_faddr.s_addr = faddr;
inp->inp_fport = fport;
in_pcbrehash(inp);
 @@ -536,7 +547,9 @@
   * On entry, *laddrp and *lportp should contain the current local
   * address and port for the PCB; t

De-orbitting tcpslice

2006-09-23 Thread Bruce M Simpson
We have tcpslice maintained in ports. We have ancient tcpslice in base 
system. We have PRs about it.


I'd like to nuke it in HEAD.

How does everyone else feel about that before I go off and do it?

BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: De-orbitting tcpslice

2006-09-23 Thread Sam Leffler
Bruce M Simpson wrote:
> We have tcpslice maintained in ports. We have ancient tcpslice in base
> system. We have PRs about it.
> 
> I'd like to nuke it in HEAD.
> 
> How does everyone else feel about that before I go off and do it?

do it
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: De-orbitting tcpslice

2006-09-23 Thread Bruce A. Mah
If memory serves me right, Bruce M Simpson wrote:
> We have tcpslice maintained in ports. We have ancient tcpslice in base 
> system. We have PRs about it.
> 
> I'd like to nuke it in HEAD.
> 
> How does everyone else feel about that before I go off and do it?

+1

Bruce.



signature.asc
Description: OpenPGP digital signature