On Wed, Nov 06, 2013 at 10:23:19AM -0500, Neil Horman wrote:
 > do_csum was identified via perf recently as a hot spot when doing
 > receive on ip over infiniband workloads.  After alot of testing and
 > ideas, we found the best optimization available to us currently is to
 > prefetch the entire data buffer prior to doing the checksum
 > 
 > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
 > index 9845371..9f2d3ee 100644
 > --- a/arch/x86/lib/csum-partial_64.c
 > +++ b/arch/x86/lib/csum-partial_64.c
 > @@ -29,8 +29,15 @@ static inline unsigned short from32to16(unsigned a)
 >   * Things tried and found to not make it faster:
 >   * Manual Prefetching
 >   * Unrolling to an 128 bytes inner loop.
 > - * Using interleaving with more registers to break the carry chains.
 
Did you mean perhaps to remove the "Manual Prefetching" line instead ?
(Curious, what was tried before that made it not worthwhile?)
 
        Dave
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to