Re: [PATCH 1/3] powerpc: Optimise 64bit csum_partial

2010-08-03 Thread Segher Boessenkool
> > Hi Segher, > >> Not really. Do you know how many 16/32-bit words you can add before a >> 64-bit register can overflow? :-) > > Thats a very good point. I thought about using 32bit adds when writing > the copy and checksum routine, but came to the conclusion that it wouldn't > go > any faster t

Re: [PATCH 1/3] powerpc: Optimise 64bit csum_partial

2010-08-03 Thread Anton Blanchard
Hi Segher, > Not really. Do you know how many 16/32-bit words you can add before a > 64-bit register can overflow? :-) Thats a very good point. I thought about using 32bit adds when writing the copy and checksum routine, but came to the conclusion that it wouldn't go any faster than one using a

Re: [PATCH 1/3] powerpc: Optimise 64bit csum_partial

2010-08-03 Thread Segher Boessenkool
On both POWER6 and POWER7 this should be as fast as we can go since we are limited by the latency of the adde instructions. Not really. Do you know how many 16/32-bit words you can add before a 64-bit register can overflow? :-) If you ever have to call this with more than 16GB of data to sum, t

[PATCH 1/3] powerpc: Optimise 64bit csum_partial

2010-08-02 Thread Anton Blanchard
The main loop of csum_partial runs very slowly on recent POWER CPUs. After some analysis on both POWER6 and POWER7 I came up with routine below. First we get the source aligned to a double word, ignoring any odd alignment to keep things simple. Then we do 64 bytes at a time, with an entry and exit