>At last, at the recent (july 2008) GCC summit, someone (sorry I forgot
who, probably someone from SuSE)
> proposed in a BOFS to have architecture and machine specific
hand-tuned (or even hand-written assembly) low
> level libraries for such basic things as memset etc..
That's exactly what I meant. The most important memory, string and math
functions should use hand-tuned assembly with CPU dispatching for the
latest instruction sets. My experiments show that the speed can be
improved by a factor 3 - 10 for unaligned memcpy on Intel processors
( page 12).
There will be more hand-tuning work to do when the 256-bit YMM registes
become available in a few years - and more to gain in speed.