On Thursday 31 July 2008 11:36, Dave Korn wrote: > Agner Fog wrote on 31 July 2008 07:14: > > > Denys Vlasenko wrote: > >> I tend to doubt that odd-byte aligned large memcpys are anywhere > >> near typical. malloc and mmap both return well-aligned buffers > >> (say, 8 byte aligned). Static and on-stack objects are also > >> at least word-aligned 99% of the time. > >> > >> memcpy can just use "relatively simple" code for copies in which > >> either src or dst is not word aligned. This cuts possibilities down from > >> 16 to 4 (or even 2?). > >> > > The XMM code is still more than 3 times faster than rep movsl when data > > are aligned by 4 or 8, but not by 16. > > Even if odd addresses are rare, they must be supported, but we can put > > the most common cases first. > > In the real world, unaligned memcpys are anything but rare. Everything's > networked these days, remember? Stuff gets misaligned real quick when you > start adding and removing various network layer headers and trailers to > unpredictably-sized packets.
Headers are usually at least rounded to 16-bit multiple. Trailers do not matter to alignment. This happens in kernel space. In kernel space, there are many differences. * memcpy are rarely bigger than a page - this negates any gains from non-temporal MOVs * memcpy cannot use XMM registers (otherwise lazy FPU saving doesn't work) * kernel developers would never accept multi-kilobyte memcpy implementation anyway Not to mention that network packets are ~1500 bytes long, even jumbo packets are only 9k tops. -- vda