Hello!
f7: 0f 7f 5c 24 f0 movq %mm3,-0x10(%rsp)
fc: 0f 7f 54 24 f8 movq %mm2,-0x8(%rsp)
101: 48 8b 5c 24 f8 mov -0x8(%rsp),%rbx
106: 48 89 5c 38 40 mov %rbx,0x40(%rax,%rdi,1)
10b: 48 8b 5c 24 f0 mov -0x10(%rsp),%rbx
110: 48 89 5c 38 48 mov %rbx,0x48(%rax,%rdi,1)
As you see in the intrinsic version gcc moves to mmx register to the stack,
reloads from the stack and writes to the destination. Why?
I don't know whether earlier gcc 4.2 versions produced such stupid code.
Compiling as 32 does similar stupidity, though gcc reloads into a mmx
register...
This is a variant of "Strange code for MMX register moves" [1] or its
dupe "mmx and movd/movq on x86_64" [2]. Since touching %mm register
switches x87 register stack to MMX mode, we penalize mmx moves severely
in order to prevent gcc to ever allocate %mm for DImode moves, unless
really necessary.
OTOH, in your particular case:
(insn 42 40 43 3 mmx.c:20 (set (mem:V2SI (plus:DI (reg:DI 65 [ ivtmp.43 ])
(const_int -64 [0xffffffffffffffc0])) [0 S8 A64])
(subreg:V2SI (reg:V4HI 90) 0)) 866 {*movv2si_internal_rex64}
(expr_list:REG_DEAD (reg:V4HI 90)
(nil)))
subreg in this pattern confuses RA to create:
(insn 68 36 38 3
/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.3.0/include/mmintrin.h:389
(set (mem/c:V4HI (plus:DI (reg/f:DI 7 sp)
(const_int -8 [0xfffffffffffffff8])) [2 S8 A8])
(reg:V4HI 32 mm3)) 865 {*movv4hi_internal_rex64} (nil))
(insn 70 69 42 3 mmx.c:20 (set (reg:V2SI 3 bx)
(mem/c:V2SI (plus:DI (reg/f:DI 7 sp)
(const_int -8 [0xfffffffffffffff8])) [2 S8 A8])) 866
{*movv2si_internal_rex64} (nil))
(insn:HI 42 70 71 3 mmx.c:20 (set (mem:V2SI (plus:DI (reg:DI 5 di
[orig:65 ivtmp.43 ] [65])
(const_int -64 [0xffffffffffffffc0])) [0 S8 A64])
(reg:V2SI 3 bx)) 866 {*movv2si_internal_rex64} (nil))
(For 32bit targets, %mm is allocated, since no integer register can hold
64bit value.)
I'll see what can be done here to tie V4HI and V2SI together.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22076
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34256
Uros.