RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 06:17 -0700, Eric Dumazet wrote: > On Mon, 2014-03-24 at 10:30 +, David Laight wrote: > > ip_fast_csum() either needs an explicit "m" constraint for the actual > > buffer (and target) bytes, or the stronger "memory" constraint. > > The 'volatile' is then not needed. I am

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread Eric Dumazet
On Mon, 2014-03-24 at 10:30 +, David Laight wrote: > From: Eric Dumazet > > On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > > > From: Eric Dumazet > > > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > > > > > It looks like a barrier() would be more appropriate. > > > > > > barrier() == __

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-24 Thread David Laight
From: Eric Dumazet > On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > > From: Eric Dumazet > > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > > > It looks like a barrier() would be more appropriate. > > > > barrier() == __asm__ __volatile__(:::"memory") > > Indeed, but now you mention it, ip

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-23 Thread Eric Dumazet
On Fri, 2014-03-21 at 14:52 -0400, David Miller wrote: > From: Eric Dumazet > Date: Fri, 21 Mar 2014 05:50:50 -0700 > > > It looks like a barrier() would be more appropriate. > > barrier() == __asm__ __volatile__(:::"memory") Indeed, but now you mention it, ip_fast_csum() do not uses volatile k

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Miller
From: Eric Dumazet Date: Fri, 21 Mar 2014 05:50:50 -0700 > It looks like a barrier() would be more appropriate. barrier() == __asm__ __volatile__(:::"memory") -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More maj

RE: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread David Laight
From: Eric Dumazet > On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > > Eric Dumazet writes: > > > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > > is insane... > > > > > > Couldn't it just be the cache miss? > > Or the fact that we mix 16 bit stores and 3

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:47 -0700, Eric Dumazet wrote: > Another idea would be to move the ip_fast_csum() call at the end of > inet_gro_complete() > > I'll try this : > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 8c54870db792..0ca8f350a532 100644 > --- a/net/ipv4/af_inet.c > +

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 06:32 -0700, Eric Dumazet wrote: > On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: > > > Or the fact that we mix 16 bit stores and 32bit loads ? > > > > iph->tot_len = newlen; > > iph->check = 0; > > iph->check = ip_fast_csum(iph, 5); > > Yep definitely. Using 16 bit

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Fri, 2014-03-21 at 05:50 -0700, Eric Dumazet wrote: > Or the fact that we mix 16 bit stores and 32bit loads ? > > iph->tot_len = newlen; > iph->check = 0; > iph->check = ip_fast_csum(iph, 5); Yep definitely. Using 16 bit loads in ip_fast_csum() totally removes the stall. I no longer see inet_

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Andi Kleen
On Fri, Mar 21, 2014 at 05:50:50AM -0700, Eric Dumazet wrote: > On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > > Eric Dumazet writes: > > > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > > is insane... > > > > > > Couldn't it just be the cache miss? >

Re: [RFC] csum experts, csum_replace2() is too expensive

2014-03-21 Thread Eric Dumazet
On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote: > Eric Dumazet writes: > > > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that > > is insane... > > > Couldn't it just be the cache miss? Or the fact that we mix 16 bit stores and 32bit loads ? iph->tot_len = newlen;