On Fri, Jul 25, 2008 at 9:08 AM, Agner Fog <[EMAIL PROTECTED]> wrote:
> Raksit Ashok wrote:
>>There is a more optimized version for 64-bit:
>>http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s
>>I think this looks similar to your implementation, Agner.
>
> Yes it is similar to my code.

3164 line source file which implements memcpy().
You got to be kidding.
How much of L1 icache it blows away in the process?
I bet it performs wonderfully on microbenchmarks though.

   2991                 .balign 16               # sadistic alignment strikes 
again
   2992 L(bkPxQx):      .int L(bkP0Q0)-L(bkPxQx) # why use two bytes when
we can use four?

Seriously. What possible reason there can be to align
a randomly accessed data table to 16 bytes?
4 bytes I understand, but 16?
--
vda

Reply via email to