------- Comment #4 from hjl dot tools at gmail dot com 2008-10-08 20:55 ------- (In reply to comment #3) > Newer patch http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00350.html >
With this patch, I got .globl foo .type foo, @function foo: xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: pabsw src(%rax), %xmm0 movdqa %xmm0, resdst(%rax) addq $16, %rax cmpq $160, %rax jne .L2 rep ret The load is combined into pabsw. The extra load insn and unaligned move are gone. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37774