On Wed, Jul 23, 2014 at 10:04 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > >> So the length is fine, and the disassembly shows that it is fixed (16 >> 32-bit words - why the heck does it use "movsl" rather than "movsq", >> whatever). > > Which is exactly right btw, he's got CONFIG_NR_CPUS=512 and 8*4*16=512.
That's not my point. Why the f*ck does it use "movsl", when "movsq" should work as well or better. Then it should use a count of 8. Because 8*8*8 is also 512 bits. Of course, with the enhanced string instructions, it's quite possible that "movsb" with a count of 64 (64*8) is the best option. Anyway, my gcc version creates a series of 8 "movq" pairs instead, which will beat all other cases, at the cost of much bigger code footprint. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/