Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread Neil Horman
On Wed, Nov 13, 2013 at 01:32:50PM -, David Laight wrote: > > > I'm not sure, whats the typical capacity for the branch predictors > > > ability to remember code paths? > ... > > > > For such simple single-target branches it goes near or over a thousand for > > recent Intel and AMD microarchit

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread Ingo Molnar
* David Laight wrote: > > > I'm not sure, whats the typical capacity for the branch predictors > > > ability to remember code paths? > ... > > > > For such simple single-target branches it goes near or over a thousand > > for recent Intel and AMD microarchitectures. Thousands for really > >

RE: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread David Laight
> > I'm not sure, whats the typical capacity for the branch predictors > > ability to remember code paths? ... > > For such simple single-target branches it goes near or over a thousand for > recent Intel and AMD microarchitectures. Thousands for really recent CPUs. IIRC the x86 can also correctl

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread Ingo Molnar
* Neil Horman wrote: > On Wed, Nov 13, 2013 at 10:09:51AM -, David Laight wrote: > > > Sure, I modified the code so that we only prefetched 2 cache lines ahead, > > > but > > > only if the overall length of the input buffer is more than 2 cache lines. > > > Below are the results (all counts

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread Neil Horman
On Wed, Nov 13, 2013 at 10:09:51AM -, David Laight wrote: > > Sure, I modified the code so that we only prefetched 2 cache lines ahead, > > but > > only if the overall length of the input buffer is more than 2 cache lines. > > Below are the results (all counts are the average of 100 iterat

RE: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-13 Thread David Laight
> Sure, I modified the code so that we only prefetched 2 cache lines ahead, but > only if the overall length of the input buffer is more than 2 cache lines. > Below are the results (all counts are the average of 100 iterations of the > csum operation, as previous tests were, I just omitted that

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Neil Horman
On Tue, Nov 12, 2013 at 12:38:01PM -0800, Joe Perches wrote: > On Tue, 2013-11-12 at 14:50 -0500, Neil Horman wrote: > > On Tue, Nov 12, 2013 at 09:33:35AM -0800, Joe Perches wrote: > > > On Tue, 2013-11-12 at 12:12 -0500, Neil Horman wrote: > [] > > > > So, the numbers are correct now that I retur

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Joe Perches
On Tue, 2013-11-12 at 14:50 -0500, Neil Horman wrote: > On Tue, Nov 12, 2013 at 09:33:35AM -0800, Joe Perches wrote: > > On Tue, 2013-11-12 at 12:12 -0500, Neil Horman wrote: [] > > > So, the numbers are correct now that I returned my hardware to its > > > previous > > > interrupt affinity state,

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Neil Horman
On Tue, Nov 12, 2013 at 09:33:35AM -0800, Joe Perches wrote: > On Tue, 2013-11-12 at 12:12 -0500, Neil Horman wrote: > > On Mon, Nov 11, 2013 at 05:42:22PM -0800, Joe Perches wrote: > > > Hi again Neil. > > > > > > Forwarding on to netdev with a concern as to how often > > > do_csum is used via cs

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Joe Perches
On Tue, 2013-11-12 at 12:12 -0500, Neil Horman wrote: > On Mon, Nov 11, 2013 at 05:42:22PM -0800, Joe Perches wrote: > > Hi again Neil. > > > > Forwarding on to netdev with a concern as to how often > > do_csum is used via csum_partial for very short headers > > and what impact any prefetch would

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Neil Horman
On Mon, Nov 11, 2013 at 05:42:22PM -0800, Joe Perches wrote: > Hi again Neil. > > Forwarding on to netdev with a concern as to how often > do_csum is used via csum_partial for very short headers > and what impact any prefetch would have there. > > Also, what changed in your test environment? > >

Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-12 Thread Neil Horman
ded Message > From: Neil Horman > To: Joe Perches > Cc: Dave Jones , linux-kernel@vger.kernel.org, > sebastien.du...@bull.net, Thomas Gleixner , Ingo > Molnar , H. Peter Anvin , > x...@kernel.org > Subject: Re: [PATCH v2 2/2] x86: add prefetching to do_csum > >

[Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

2013-11-11 Thread Joe Perches
Cc: Dave Jones , linux-kernel@vger.kernel.org, sebastien.du...@bull.net, Thomas Gleixner , Ingo Molnar , H. Peter Anvin , x...@kernel.org Subject: Re: [PATCH v2 2/2] x86: add prefetching to do_csum On Fri, Nov 08, 2013 at 12:29:07PM -0800, Joe Perches wrote: > On Fri, 2013-11-08 at 15:14 -0

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-11 Thread Ingo Molnar
* Neil Horman wrote: > Ingo, does that seem reasonable to you? FYI, in the past few days I've been busy due to the merge window, but everything I've seen so far in this portion of the thread gave me warm fuzzy feelings, so I definitely like the direction. (More once I get around to looking a

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-11 Thread Neil Horman
On Fri, Nov 08, 2013 at 12:29:07PM -0800, Joe Perches wrote: > On Fri, 2013-11-08 at 15:14 -0500, Neil Horman wrote: > > On Fri, Nov 08, 2013 at 11:33:13AM -0800, Joe Perches wrote: > > > On Fri, 2013-11-08 at 14:01 -0500, Neil Horman wrote: > > > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Per

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Joe Perches
On Fri, 2013-11-08 at 15:14 -0500, Neil Horman wrote: > On Fri, Nov 08, 2013 at 11:33:13AM -0800, Joe Perches wrote: > > On Fri, 2013-11-08 at 14:01 -0500, Neil Horman wrote: > > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > > > > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wr

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Neil Horman
On Fri, Nov 08, 2013 at 11:33:13AM -0800, Joe Perches wrote: > On Fri, 2013-11-08 at 14:01 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > > > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > > > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jo

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Neil Horman
On Fri, Nov 08, 2013 at 11:17:39AM -0800, Joe Perches wrote: > On Fri, 2013-11-08 at 14:07 -0500, Neil Horman wrote: > > On Fri, Nov 08, 2013 at 08:51:07AM -0800, Joe Perches wrote: > > > On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote: > > > > On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Per

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Joe Perches
On Fri, 2013-11-08 at 14:01 -0500, Neil Horman wrote: > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > > > > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Hor

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread H. Peter Anvin
On 11/08/2013 11:07 AM, Neil Horman wrote: > On Fri, Nov 08, 2013 at 08:51:07AM -0800, Joe Perches wrote: >> On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote: >>> On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote: On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote: > On Wed

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Joe Perches
On Fri, 2013-11-08 at 14:07 -0500, Neil Horman wrote: > On Fri, Nov 08, 2013 at 08:51:07AM -0800, Joe Perches wrote: > > On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote: > > > On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote: > > > > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wr

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Neil Horman
On Fri, Nov 08, 2013 at 08:51:07AM -0800, Joe Perches wrote: > On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote: > > > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote: > > > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Per

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Neil Horman
On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > > > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > > > > do_csum was identified via perf recently a

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Joe Perches
On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote: > On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote: > > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote: > > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > > [] > > > > __always_inline instead of inline > > > >

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-08 Thread Neil Horman
On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote: > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > [] > > > __always_inline instead of inline > > > static __always_inline void prefetch_lines(const void *addr, size_

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-07 Thread Neil Horman
On Wed, Nov 06, 2013 at 12:19:52PM -0800, Andi Kleen wrote: > Neil Horman writes: > > > do_csum was identified via perf recently as a hot spot when doing > > receive on ip over infiniband workloads. After alot of testing and > > ideas, we found the best optimization available to us currently is

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Andi Kleen
Neil Horman writes: > do_csum was identified via perf recently as a hot spot when doing > receive on ip over infiniband workloads. After alot of testing and > ideas, we found the best optimization available to us currently is to > prefetch the entire data buffer prior to doing the checksum On w

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Neil Horman
On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > > > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > > > > do_csum was identified via perf recently a

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Joe Perches
On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote: > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: [] > > __always_inline instead of inline > > static __always_inline void prefetch_lines(const void *addr, size_t len) > > { > > const void *end = addr + len; > > ... > > > > buf

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Neil Horman
On Wed, Nov 06, 2013 at 10:23:10AM -0800, Eric Dumazet wrote: > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > > My guess was that the whole comment was made in reference to the fact that > > checksum offload negated all these advantages. Thats not so true anymore, > > since > > infin

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Eric Dumazet
On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > My guess was that the whole comment was made in reference to the fact that > checksum offload negated all these advantages. Thats not so true anymore, > since > infiniband needs csum in software for ipoib. > > I'll fix this up and send a v

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Neil Horman
On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote: > On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > > > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > > > > do_csum was identified via perf recently a

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Joe Perches
On Wed, 2013-11-06 at 10:54 -0500, Neil Horman wrote: > On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > > > do_csum was identified via perf recently as a hot spot when doing > > > receive on ip over infiniband workload

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Neil Horman
On Wed, Nov 06, 2013 at 10:34:29AM -0500, Dave Jones wrote: > On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > > do_csum was identified via perf recently as a hot spot when doing > > receive on ip over infiniband workloads. After alot of testing and > > ideas, we found the best op

Re: [PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Dave Jones
On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote: > do_csum was identified via perf recently as a hot spot when doing > receive on ip over infiniband workloads. After alot of testing and > ideas, we found the best optimization available to us currently is to > prefetch the entire da

[PATCH v2 2/2] x86: add prefetching to do_csum

2013-11-06 Thread Neil Horman
do_csum was identified via perf recently as a hot spot when doing receive on ip over infiniband workloads. After alot of testing and ideas, we found the best optimization available to us currently is to prefetch the entire data buffer prior to doing the checksum Signed-off-by: Neil Horman CC: se