On Thu, May 24, 2018 at 11:22:27AM +0000, Christophe Leroy wrote: > Improve __csum_partial by interleaving loads and adds. > > On a 8xx, it brings neither improvement nor degradation. > On a 83xx, it brings a 25% improvement.
Thanks! Looks fine to me. > Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr> Reviewed-by: Segher Boessenkool <seg...@kernel.crashing.org> > --- > arch/powerpc/lib/checksum_32.S | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S > index d2238ea82209..aa224069f93a 100644 > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -47,16 +47,25 @@ _GLOBAL(__csum_partial) > bdnz 2b > 21: srwi. r6,r4,4 /* # blocks of 4 words to do */ > beq 3f > + lwz r0,4(r3) > mtctr r6 > -22: lwz r0,4(r3) > lwz r6,8(r3) > + adde r5,r5,r0 > lwz r7,12(r3) > + adde r5,r5,r6 > lwzu r8,16(r3) > + adde r5,r5,r7 > + bdz 23f > +22: lwz r0,4(r3) > + adde r5,r5,r8 > + lwz r6,8(r3) > adde r5,r5,r0 > + lwz r7,12(r3) > adde r5,r5,r6 > + lwzu r8,16(r3) > adde r5,r5,r7 > - adde r5,r5,r8 > bdnz 22b > +23: adde r5,r5,r8 > 3: andi. r0,r4,2 > beq+ 4f > lhz r0,4(r3) > -- > 2.13.3