Re: Much improved sendfile(2) kernel implementation
On Fri, Sep 22, 2006 at 11:48:23PM +0100, Robert Watson wrote: > The impact of TSO is clearly dramatic, especially when combined with the > patch, but I'm a bit concerned by the drop in performance in the patched > non-TSO case. For network cards which will always have TSO enabled, this > isn't an issue, but do we see a similar affect for drivers without TSO? > What can we put this drop down to? We probably also need to make sure that any performance increase in TSO isn't due to us getting TCP congestion control wrong. I think in Linux they had problems when they first introduced TSO because TCP was advancing the congestion window by a TSO-sized chunk instead of a wire packet. OTOH, I think Andre and Drew's tests are low-latency, so congestion control isn't likely to be playing a big role, so the improvements are unlikely to be due to this. David. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Much improved sendfile(2) kernel implementation
Robert Watson wrote: On Sat, 23 Sep 2006, Andre Oppermann wrote: Without patch: 87380 393216 39321610.00 2163.08 100.00 19.353.787 1.466 Without patch + TSO: 87380 393216 39321610.00 4367.18 71.5442.071.342 1.578 With patch: 87380 393216 39321610.01 1882.73 86.1518.433.749 1.604 With patch + TSO: 87380 393216 39321610.00 6961.08 47.6960.110.561 1.415 The impact of TSO is clearly dramatic, especially when combined with the patch, but I'm a bit concerned by the drop in performance in the patched non-TSO case. For network cards which will always have TSO enabled, this isn't an issue, but do we see a similar affect for drivers without TSO? What can we put this drop down to? If you look at my GigE numbers there is no drop for the new-sendfile w/o TSO case. In this 10Gig test the drop is really and artifact of how the whole setup and the way netperf makes use of the sendfile call. Internally new-sendfile waits until 50% of the socket buffer are free to be bulk filled again. This value can be modified by setting a low watermark on the send socket buffer. Netperf does buffer sized sendfile invocations and this is very timing critical with 10G. Which gives this picture: call sendfile(380K) -> fill socket buffer -> wait -> fill rest -> return -> call sendfile(380K) ... Not to mention all the additional work tcp_output() has to do w/o TSO. Especially with large buffers it has to loop over the mbuf chain for each packet to find out where to start copying. And besides there is no point in having a non-TSO capable interface at above 1-2Gbit. Not even Linux can keep up there. -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Much improved sendfile(2) kernel implementation
David Malone wrote: On Fri, Sep 22, 2006 at 11:48:23PM +0100, Robert Watson wrote: The impact of TSO is clearly dramatic, especially when combined with the patch, but I'm a bit concerned by the drop in performance in the patched non-TSO case. For network cards which will always have TSO enabled, this isn't an issue, but do we see a similar affect for drivers without TSO? What can we put this drop down to? We probably also need to make sure that any performance increase in TSO isn't due to us getting TCP congestion control wrong. I think in Linux they had problems when they first introduced TSO because TCP was advancing the congestion window by a TSO-sized chunk instead of a wire packet. OTOH, I think Andre and Drew's tests are low-latency, so congestion control isn't likely to be playing a big role, so the improvements are unlikely to be due to this. The congestion window is increased based on the ACK's received. TSO is only done on the send side and only up to the current congestion window. I have been careful not to get any changes in congestion control behavior with TSO. (Which does not mean that there may be other bugs lurking in our congestion control.) -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Much improved sendfile(2) kernel implementation
> The congestion window is increased based on the ACK's received. TSO > is only done on the send side and only up to the current congestion > window. I have been careful not to get any changes in congestion > control behavior with TSO. (Which does not mean that there may be > other bugs lurking in our congestion control.) I think the reason this happened in Linux was because thw congestion window is counted in segments, which were now TSO sized. You'd send 1 TSO sized segment, get back (say) 10 ACKs because of segmentation and increase the window size by 10*TSO_SEG_SIZE/cwnd insead of 10*REAL_MSS/cwnd. We're unlikely to have exactly the same bug, because we count cwnd in bytes, but it doesn't rule out haveing other unexpected/subtle interactions (like higher varience of RTT esitmation - I guess all packets in a TSO segment are now sent with the same timestamp?). David. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: bin/41647: ifconfig(8) doesn't accept lladdr along with inet address family
Synopsis: ifconfig(8) doesn't accept lladdr along with inet address family State-Changed-From-To: analyzed->suspended State-Changed-By: bms State-Changed-When: Sat Sep 23 15:00:03 UTC 2006 State-Changed-Why: Not a serious problem. These limitations can be worked around e.g. by using /etc/start_if. scripts to set the ethernet addresses. Responsible-Changed-From-To: bms->freebsd-net Responsible-Changed-By: bms Responsible-Changed-When: Sat Sep 23 15:00:03 UTC 2006 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=41647 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/56233: IPsec tunnel (ESP) over IPv6: MTU computation is wrong
Synopsis: IPsec tunnel (ESP) over IPv6: MTU computation is wrong Responsible-Changed-From-To: bms->freebsd-net Responsible-Changed-By: bms Responsible-Changed-When: Sat Sep 23 16:28:40 UTC 2006 Responsible-Changed-Why: I must focus on more specific areas. http://www.freebsd.org/cgi/query-pr.cgi?pr=56233 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/65616: IPSEC can't detunnel GRE packets after real ESP encryption
Synopsis: IPSEC can't detunnel GRE packets after real ESP encryption Responsible-Changed-From-To: bms->freebsd-net Responsible-Changed-By: bms Responsible-Changed-When: Sat Sep 23 16:29:17 UTC 2006 Responsible-Changed-Why: I must focus on more specific areas. http://www.freebsd.org/cgi/query-pr.cgi?pr=65616 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/38554: changing interface ipaddress doesn't seem to work
Synopsis: changing interface ipaddress doesn't seem to work Responsible-Changed-From-To: bms->freebsd-net Responsible-Changed-By: bms Responsible-Changed-When: Sat Sep 23 17:36:57 UTC 2006 Responsible-Changed-Why: Back to the world for you, but not after actually doing some work on it... http://www.freebsd.org/cgi/query-pr.cgi?pr=38554 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/39937: ipstealth issue
Synopsis: ipstealth issue State-Changed-From-To: analyzed->suspended State-Changed-By: bms State-Changed-When: Sat Sep 23 17:38:49 UTC 2006 State-Changed-Why: Back to the free pool for you. Responsible-Changed-From-To: bms->freebsd-net Responsible-Changed-By: bms Responsible-Changed-When: Sat Sep 23 17:38:49 UTC 2006 Responsible-Changed-Why: Back to the free pool for you. http://www.freebsd.org/cgi/query-pr.cgi?pr=39937 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/38554: changing interface ipaddress doesn't seem to work
The following reply was made to PR kern/38554; it has been noted by GNATS. From: Bruce M Simpson <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Subject: Re: kern/38554: changing interface ipaddress doesn't seem to work Date: Sat, 23 Sep 2006 18:35:50 +0100 This is a multi-part message in MIME format. --03070707080504040106 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Before I suspend my work on this PR, here's a diff I pulled from trying to port the changes to today's CURRENT. The patch doesn't work but haven't tested exhaustively. Need to focus on other things. --03070707080504040106 Content-Type: text/x-patch; name="archie-locia-20060923.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="archie-locia-20060923.diff" //depot/user/bms/nethead/sys/netinet/in.c#1 - /home/bms/fp4/nethead/sys/netinet/in.c --- /tmp/tmp.23928.0 Sat Sep 23 18:32:59 2006 +++ /home/bms/fp4/nethead/sys/netinet/in.c Sat Sep 23 17:37:13 2006 @@ -459,6 +459,11 @@ * a routing process they will come back. */ in_ifadown(&ia->ia_ifa, 1); + /* + * Mark the interface address as no longer valid. + * Sockets that are bound to it should notice. + */ + ia->ia_ifa.ifa_flags |= RTF_REJECT; EVENTHANDLER_INVOKE(ifaddr_event, ifp); error = 0; break; //depot/user/bms/nethead/sys/netinet/in_pcb.c#1 - /home/bms/fp4/nethead/sys/netinet/in_pcb.c --- /tmp/tmp.23928.1 Sat Sep 23 18:32:59 2006 +++ /home/bms/fp4/nethead/sys/netinet/in_pcb.c Sat Sep 23 18:02:08 2006 @@ -238,14 +238,17 @@ anonport = inp->inp_lport == 0 && (nam == NULL || ((struct sockaddr_in *)nam)->sin_port == 0); error = in_pcbbind_setup(inp, nam, &inp->inp_laddr.s_addr, - &inp->inp_lport, cred); + &inp->inp_lport, &inp->inp_locia, cred); if (error) return (error); if (in_pcbinshash(inp) != 0) { inp->inp_laddr.s_addr = INADDR_ANY; inp->inp_lport = 0; + inp->inp_locia = NULL; return (EAGAIN); } + if (inp->inp_locia != NULL) + IFAREF(&inp->inp_locia->ia_ifa); if (anonport) inp->inp_flags |= INP_ANONPORT; return (0); @@ -262,12 +265,13 @@ */ int in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp, -u_short *lportp, struct ucred *cred) +u_short *lportp, struct in_ifaddr **iap, struct ucred *cred) { struct socket *so = inp->inp_socket; unsigned short *lastport; struct sockaddr_in *sin; struct inpcbinfo *pcbinfo = inp->inp_pcbinfo; + struct in_ifaddr *ia = NULL; struct in_addr laddr; u_short lport = 0; int wild = 0, reuseport = (so->so_options & SO_REUSEPORT); @@ -319,7 +323,8 @@ } else if (sin->sin_addr.s_addr != INADDR_ANY) { sin->sin_port = 0; /* yech... */ bzero(&sin->sin_zero, sizeof(sin->sin_zero)); - if (ifa_ifwithaddr((struct sockaddr *)sin) == 0) + if ((ia = (struct in_ifaddr *)ifa_ifwithaddr( + (struct sockaddr *)sin)) == 0) return (EADDRNOTAVAIL); } laddr = sin->sin_addr; @@ -478,6 +483,8 @@ return (EINVAL); *laddrp = laddr.s_addr; *lportp = lport; + if (iap != NULL) + *iap = ia; return (0); } @@ -490,6 +497,7 @@ int in_pcbconnect(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred) { + struct in_ifaddr *locia; u_short lport, fport; in_addr_t laddr, faddr; int anonport, error; @@ -501,7 +509,7 @@ laddr = inp->inp_laddr.s_addr; anonport = (lport == 0); error = in_pcbconnect_setup(inp, nam, &laddr, &lport, &faddr, &fport, - NULL, cred); + NULL, &locia, cred); if (error) return (error); @@ -519,6 +527,9 @@ /* Commit the remaining changes. */ inp->inp_lport = lport; inp->inp_laddr.s_addr = laddr; + inp->inp_locia = locia; + if (inp->inp_locia != NULL) + IFAREF(&inp->inp_locia->ia_ifa); inp->inp_faddr.s_addr = faddr; inp->inp_fport = fport; in_pcbrehash(inp); @@ -536,7 +547,9 @@ * On entry, *laddrp and *lportp should contain the current local * address and port for the PCB; t
De-orbitting tcpslice
We have tcpslice maintained in ports. We have ancient tcpslice in base system. We have PRs about it. I'd like to nuke it in HEAD. How does everyone else feel about that before I go off and do it? BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: De-orbitting tcpslice
Bruce M Simpson wrote: > We have tcpslice maintained in ports. We have ancient tcpslice in base > system. We have PRs about it. > > I'd like to nuke it in HEAD. > > How does everyone else feel about that before I go off and do it? do it ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: De-orbitting tcpslice
If memory serves me right, Bruce M Simpson wrote: > We have tcpslice maintained in ports. We have ancient tcpslice in base > system. We have PRs about it. > > I'd like to nuke it in HEAD. > > How does everyone else feel about that before I go off and do it? +1 Bruce. signature.asc Description: OpenPGP digital signature