From: Bill Fink <[EMAIL PROTECTED]>
Date: Fri, 14 Sep 2007 03:20:55 -0400
> TSO disabled performance is always better than equivalent TSO enabled
> performance. With TSO enabled, the optimum performance is indeed at
> a TX/RX interrupt coalescing value of 75 usec. With TSO disabled,
> performanc
On Mon, 27 Aug 2007, jamal wrote:
> On Sun, 2007-26-08 at 19:04 -0700, David Miller wrote:
>
> > The transfer is much better behaved if we ACK every two full sized
> > frames we copy into the receiver, and therefore don't stretch ACK, but
> > at the cost of cpu utilization.
>
> The rx coalescing
Hi Dave,
David Miller <[EMAIL PROTECTED]> wrote on 08/29/2007 10:21:50 AM:
> From: Krishna Kumar2 <[EMAIL PROTECTED]>
> Date: Wed, 29 Aug 2007 08:53:30 +0530
>
> > I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
> > side.
>
> Bad choice of test, this is cpu limited since the
> From: Krishna Kumar2 <[EMAIL PROTECTED]>
> Date: Wed, 29 Aug 2007 10:43:23 +0530
>
> > The reason was to run parallel copies, not for buffer limitations.
>
> Oh, I see.
>
> I'll note in passing that current lmbench-3 has some
> parallelization features you could play with, you might want
> t
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 29 Aug 2007 10:43:23 +0530
> The reason was to run parallel copies, not for buffer limitations.
Oh, I see.
I'll note in passing that current lmbench-3 has some parallelization
features you could play with, you might want to check it out.
-
To u
[EMAIL PROTECTED] wrote on 08/29/2007 10:21:50 AM:
> From: Krishna Kumar2 <[EMAIL PROTECTED]>
> Date: Wed, 29 Aug 2007 08:53:30 +0530
>
> > I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
> > side.
>
> Bad choice of test, this is cpu limited since the scp
> has to encrypt and
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 29 Aug 2007 08:53:30 +0530
> I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
> side.
Bad choice of test, this is cpu limited since the scp
has to encrypt and MAC hash all the data it sends.
Use something like straight ftp o
Hi Dave,
I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
side.
192.168.1.1.37201 > 192.168.1.2.ssh: P 837178092:837178596(504) ack
1976304527 win 79
192.168.1.1.37201 > 192.168.1.2.ssh: . 837178596:837181492(2896) ack
1976304527 win 79
192.168.1.1.37201 > 192.168.1.2.ssh: .
On Sun, 2007-26-08 at 19:04 -0700, David Miller wrote:
> The transfer is much better behaved if we ACK every two full sized
> frames we copy into the receiver, and therefore don't stretch ACK, but
> at the cost of cpu utilization.
The rx coalescing in theory should help by accumulating more ACKs
David Miller wrote:
From: John Heffner <[EMAIL PROTECTED]>
Date: Sun, 26 Aug 2007 21:32:26 -0400
There are a few interesting things here. For one, the bursts caused by
TSO seem to be causing the receiver to do stretch acks. This may have a
negative impact on flow performance, but it's hard
From: John Heffner <[EMAIL PROTECTED]>
Date: Sun, 26 Aug 2007 21:32:26 -0400
> There are a few interesting things here. For one, the bursts caused by
> TSO seem to be causing the receiver to do stretch acks. This may have a
> negative impact on flow performance, but it's hard to say for sure h
Bill Fink wrote:
Here's the beforeafter delta of the receiver's "netstat -s"
statistics for the TSO enabled case:
Ip:
3659898 total packets received
3659898 incoming packets delivered
80050 requests sent out
Tcp:
2 passive connection openings
3659897 segments received
800
On Fri, 24 Aug 2007, John Heffner wrote:
> Bill Fink wrote:
> > Here you can see there is a major difference in the TX CPU utilization
> > (99 % with TSO disabled versus only 39 % with TSO enabled), although
> > the TSO disabled case was able to squeeze out a little extra performance
> > from its
On Sat, 25 Aug 2007, Herbert Xu wrote:
> On Fri, Aug 24, 2007 at 02:25:03PM -0700, David Miller wrote:
> >
> > My hunch is that even if in the non-TSO case the TX packets were all
> > back to back in the cards TX ring, TSO still spits them out faster on
> > the wire.
>
> If this is the case then
Bill Fink wrote:
Here you can see there is a major difference in the TX CPU utilization
(99 % with TSO disabled versus only 39 % with TSO enabled), although
the TSO disabled case was able to squeeze out a little extra performance
from its extra CPU utilization. Interestingly, with TSO enabled, t
On Fri, Aug 24, 2007 at 02:25:03PM -0700, David Miller wrote:
>
> My hunch is that even if in the non-TSO case the TX packets were all
> back to back in the cards TX ring, TSO still spits them out faster on
> the wire.
If this is the case then we should see an improvement by
disabling TSO and enab
From: jamal <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 08:14:16 -0400
> Seems the receive side of the sender is also consuming a lot more cpu
> i suspect because receiver is generating a lot more ACKs with TSO.
I've seen this behavior before on a low cpu powered receiver and the
issue is that bat
Bill Fink wrote:
On Thu, 23 Aug 2007, Rick Jones wrote:
jamal wrote:
[TSO already passed - iirc, it has been
demostranted to really not add much to throughput (cant improve much
over closeness to wire speed) but improve CPU utilization].
In the one gig space sure, but in the 10 Gig space,
On Fri, 24 Aug 2007, jamal wrote:
> On Thu, 2007-23-08 at 23:18 -0400, Bill Fink wrote:
>
> [..]
> > Here you can see there is a major difference in the TX CPU utilization
> > (99 % with TSO disabled versus only 39 % with TSO enabled), although
> > the TSO disabled case was able to squeeze out a
A current hot topic of research is reducing the number of ACK's to make TCP
work better over asymmetric links like 3G.
Oy. People running Solaris and HP-UX have been "researching" ACK reductions
since 1997 if not earlier.
rick jones
-
To unsubscribe from this list: send the line "unsubscrib
On Thu, 2007-23-08 at 20:34 -0700, Stephen Hemminger wrote:
> A current hot topic of research is reducing the number of ACK's to make TCP
> work better over asymmetric links like 3G.
One other good reason to reduce ACKs to battery powered (3G) terminals
is it reduces the power consumption i.e you
On Thu, 2007-23-08 at 23:18 -0400, Bill Fink wrote:
[..]
> Here you can see there is a major difference in the TX CPU utilization
> (99 % with TSO disabled versus only 39 % with TSO enabled), although
> the TSO disabled case was able to squeeze out a little extra performance
> from its extra CPU u
On Thu, 23 Aug 2007 18:38:22 -0400
jamal <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-23-08 at 15:30 -0700, David Miller wrote:
> > From: jamal <[EMAIL PROTECTED]>
> > Date: Thu, 23 Aug 2007 18:04:10 -0400
> >
> > > Possibly a bug - but you really should turn off TSO if you are doing
> > > huge inte
On Thu, 23 Aug 2007, Rick Jones wrote:
> jamal wrote:
> > [TSO already passed - iirc, it has been
> > demostranted to really not add much to throughput (cant improve much
> > over closeness to wire speed) but improve CPU utilization].
>
> In the one gig space sure, but in the 10 Gig space, TSO on
On Thu, 2007-23-08 at 15:35 -0700, Rick Jones wrote:
> jamal wrote:
> > [TSO already passed - iirc, it has been
> > demostranted to really not add much to throughput (cant improve much
> > over closeness to wire speed) but improve CPU utilization].
>
> In the one gig space sure, but in the 10 Gig
jamal wrote:
[TSO already passed - iirc, it has been
demostranted to really not add much to throughput (cant improve much
over closeness to wire speed) but improve CPU utilization].
In the one gig space sure, but in the 10 Gig space, TSO on/off does make a
difference for throughput.
rick jon
On Thu, 2007-23-08 at 15:30 -0700, David Miller wrote:
> From: jamal <[EMAIL PROTECTED]>
> Date: Thu, 23 Aug 2007 18:04:10 -0400
>
> > Possibly a bug - but you really should turn off TSO if you are doing
> > huge interactive transactions (which is fair because there is a clear
> > demarcation).
>
From: jamal <[EMAIL PROTECTED]>
Date: Thu, 23 Aug 2007 18:04:10 -0400
> Possibly a bug - but you really should turn off TSO if you are doing
> huge interactive transactions (which is fair because there is a clear
> demarcation).
I don't see how this can matter.
TSO only ever does anything if you
On Thu, 2007-23-08 at 18:04 -0400, jamal wrote:
> The litmus test is the same as any change that is supposed to improve
> net performance - it has to demonstrate it is not intrusive and that it
> improves (consistently) performance. The standard metrics are
> {throughput, cpu-utilization, latency}
On Wed, 2007-22-08 at 13:21 -0700, David Miller wrote:
> From: Rick Jones <[EMAIL PROTECTED]>
> Date: Wed, 22 Aug 2007 10:09:37 -0700
>
> > Should it be any more or less worrysome than small packet
> > performance (eg the TCP_RR stuff I posted recently) being rather
> > worse with TSO enabled than
David Miller <[EMAIL PROTECTED]> wrote on 08/22/2007 02:44:40 PM:
> From: Krishna Kumar2 <[EMAIL PROTECTED]>
> Date: Wed, 22 Aug 2007 12:33:04 +0530
>
> > Does turning off batching solve that problem? What I mean by that is:
> > batching can be disabled if a TSO device is worse for some cases.
>
>
From: Rick Jones <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 10:09:37 -0700
> Should it be any more or less worrysome than small packet
> performance (eg the TCP_RR stuff I posted recently) being rather
> worse with TSO enabled than with it disabled?
That, like any such thing shown by the batching
David Miller wrote:
I think the jury is still out, but seeing TSO perform even slightly
worse with the batching changes in place would be very worrysome.
This applies to both throughput and cpu utilization.
Should it be any more or less worrysome than small packet performance (eg the
TCP_RR st
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 12:33:04 +0530
> Does turning off batching solve that problem? What I mean by that is:
> batching can be disabled if a TSO device is worse for some cases.
This new batching stuff isn't going to be enabled or disabled
on a per-device
Hi Dave,
David Miller <[EMAIL PROTECTED]> wrote on 08/22/2007 09:52:29 AM:
> From: Krishna Kumar2 <[EMAIL PROTECTED]>
> Date: Wed, 22 Aug 2007 09:41:52 +0530
>
> > > Because TSO does batching already, so it's a very good
> > > "tit for tat" comparison of the new batching scheme
> > > vs. an exis
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 09:41:52 +0530
> David Miller <[EMAIL PROTECTED]> wrote on 08/22/2007 12:21:43 AM:
>
> > From: jamal <[EMAIL PROTECTED]>
> > Date: Tue, 21 Aug 2007 08:30:22 -0400
> >
> > > On Tue, 2007-21-08 at 00:18 -0700, David Miller wrote:
> > >
David Miller <[EMAIL PROTECTED]> wrote on 08/22/2007 12:21:43 AM:
> From: jamal <[EMAIL PROTECTED]>
> Date: Tue, 21 Aug 2007 08:30:22 -0400
>
> > On Tue, 2007-21-08 at 00:18 -0700, David Miller wrote:
> >
> > > Using 16K buffer size really isn't going to keep the pipe full enough
> > > for TSO.
>
From: jamal <[EMAIL PROTECTED]>
Date: Tue, 21 Aug 2007 17:09:12 -0400
> Examples, a busy ssh or irc server and you could go as far as
> looking at the most pre-dominant app on the wild west, http (average
> page size from a few years back was in the range of 10-20K and can
> be simulated with good
On Tue, 2007-21-08 at 11:51 -0700, David Miller wrote:
> Because TSO does batching already, so it's a very good
> "tit for tat" comparison of the new batching scheme
> vs. an existing one.
Fair enough - I may have read too much into your email then;->
For bulk type of apps (where TSO will make a
From: jamal <[EMAIL PROTECTED]>
Date: Tue, 21 Aug 2007 08:30:22 -0400
> On Tue, 2007-21-08 at 00:18 -0700, David Miller wrote:
>
> > Using 16K buffer size really isn't going to keep the pipe full enough
> > for TSO.
>
> Why the comparison with TSO (or GSO for that matter)?
Because TSO does bat
On Tue, 2007-21-08 at 00:18 -0700, David Miller wrote:
> Using 16K buffer size really isn't going to keep the pipe full enough
> for TSO.
Why the comparison with TSO (or GSO for that matter)?
Seems to me that is only valid/fair if you have a single flow.
Batching is multi-flow focused (or i sho
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 11:36:03 +0530
> > I ran 3 iterations of 45 sec tests (total 1 hour 16 min, but I will
> > run a longer one tonight). The results are (results in KB/s, and %):
>
> I ran a 8.5 hours run with no batching + another 8.5 hours run with
>
Hi Dave,
> I ran 3 iterations of 45 sec tests (total 1 hour 16 min, but I will
> run a longer one tonight). The results are (results in KB/s, and %):
I ran a 8.5 hours run with no batching + another 8.5 hours run with
batching (Buffer sizes: "32 128 512 4096 16384", Threads: "1 8 32",
Each test
Forgot to mention one thing:
> This fix reduced
> retransmissions from 180,000 to 55,000 or so. When I changed IPoIB
> driver to use iterative sends of each skb instead of creating multiple
> Work Request's, that number went down to 15].
This also reduced TCP No Delay performance from huge perce
Hi Dave,
David Miller <[EMAIL PROTECTED]> wrote on 08/08/2007 04:19:00 PM:
> From: Krishna Kumar <[EMAIL PROTECTED]>
> Date: Wed, 08 Aug 2007 15:01:14 +0530
>
> > RESULTS: The performance improvement for TCP No Delay is in the range
of -8%
> >to 320% (with -8% being the sole negative), with m
David Miller <[EMAIL PROTECTED]> wrote on 08/09/2007 09:57:27 AM:
>
> > Patrick had suggested calling dev_hard_start_xmit() instead of
> > conditionally calling the new API and to remove the new API
> > entirely. The driver determines whether batching is required or
> > not depending on (skb==NULL)
Hi Dave,
David Miller <[EMAIL PROTECTED]> wrote on 08/09/2007 03:31:37 AM:
> > What do you generally think of the patch/implementation ? :)
>
> We have two driver implementation paths on recieve and now
> we'll have two on send, and that's not a good trend.
Correct.
> In an ideal world all the
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Thu, 9 Aug 2007 09:49:57 +0530
> Patrick had suggested calling dev_hard_start_xmit() instead of
> conditionally calling the new API and to remove the new API
> entirely. The driver determines whether batching is required or
> not depending on (skb==NU
Herbert Xu <[EMAIL PROTECTED]> wrote on 08/08/2007 07:12:47 PM:
> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> >
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO a
Hello Herbert,
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
>
> You could even lower the bar by disabling TSO and enablin
On Wed, 2007-08-08 at 15:22 -0700, David Miller wrote:
> The driver path, however, does not exist on an island and what
> we care about is the final result with the changes running
> inside the full system.
>
> So, to be honest, besides for initial internal development
> feedback, the isolated te
On Wed, 2007-08-08 at 21:55 +0100, Stephen Hemminger wrote:
> > pktgen shows a clear win if you test the driver path - which is what you
> > should test because thats where the batching changes are.
> > Using TCP or UDP adds other variables[1] that need to be isolated first
> > in order to quanti
From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 11:14:35 -0400
> pktgen shows a clear win if you test the driver path - which is what
> you should test because thats where the batching changes are.
The driver path, however, does not exist on an island and what
we care about is the final re
From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 16:39:47 +0530
> What do you generally think of the patch/implementation ? :)
We have two driver implementation paths on recieve and now
we'll have two on send, and that's not a good trend.
In an ideal world all the drivers would be
On Wed, 08 Aug 2007 11:14:35 -0400
jamal <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-08-08 at 21:42 +0800, Herbert Xu wrote:
> > On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> > >
> > > Not because I think it obviates your work, but rather because I'm
> > > curious, could you test
On Wed, 2007-08-08 at 21:42 +0800, Herbert Xu wrote:
> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> >
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compar
On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
>
> Not because I think it obviates your work, but rather because I'm
> curious, could you test a TSO-in-hardware driver converted to
> batching and see how TSO alone compares to batching for a pure
> TCP workload?
You could even lower
David Miller <[EMAIL PROTECTED]> wrote on 08/08/2007 04:19:00 PM:
> From: Krishna Kumar <[EMAIL PROTECTED]>
> Date: Wed, 08 Aug 2007 15:01:14 +0530
>
> > RESULTS: The performance improvement for TCP No Delay is in the range
of -8%
> >to 320% (with -8% being the sole negative), with many indivi
From: Krishna Kumar <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 15:01:14 +0530
> RESULTS: The performance improvement for TCP No Delay is in the range of -8%
> to 320% (with -8% being the sole negative), with many individual tests
> giving 50% or more improvement (I think it is to do wi
This set of patches implements the batching API, and adds support for this
API in IPoIB.
List of changes from original submission:
-
1. [Patrick] Suggestion to remove tx_queue_len check for enabling batching.
2. [Patrick] Move queue purging to dev_deactiva
60 matches
Mail list logo