Basile STARYNKEVITCH wrote:
>At last, at the recent (july 2008) GCC summit, someone (sorry I forgot who, probably someone from SuSE) > proposed in a BOFS to have architecture and machine specific hand-tuned (or even hand-written assembly) low
> level libraries for such basic things as memset etc..

That's exactly what I meant. The most important memory, string and math functions should use hand-tuned assembly with CPU dispatching for the latest instruction sets. My experiments show that the speed can be improved by a factor 3 - 10 for unaligned memcpy on Intel processors (http://www.agner.org/optimize/optimizing_cpp.pdf page 12).

There will be more hand-tuning work to do when the 256-bit YMM registes become available in a few years - and more to gain in speed.

Reply via email to