------- Comment #4 from christophe at saout dot de 2008-08-13 22:15 ------- Ok, I'm completely not in my game here, but after staring at rtl dumps and gcc code for about two hours straight, things slowly start to make a litte sense...
(please tell me to shut up and forget about it if I am totally wrong about this) In my opinion the issue comes from here: (define_insn "*vec_concatv2di_rex" [(set (match_operand:V2DI 0 "register_operand" "=Y2,Yi,!Y2,Y2,x,x,x") (vec_concat:V2DI (match_operand:DI 1 "nonimmediate_operand" " m,r ,*y ,0 ,0,0,m") (match_operand:DI 2 "vector_move_operand" " C,C ,C ,Y2,x,m,0")))] "TARGET_64BIT" "@ movq\t{%1, %0|%0, %1} movq\t{%1, %0|%0, %1} movq2dq\t{%1, %0|%0, %1} punpcklqdq\t{%2, %0|%0, %2} movlhps\t{%2, %0|%0, %2} movhps\t{%2, %0|%0, %2} movlps\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov,ssemov,ssemov,sselog,ssemov,ssemov,ssemov") (set_attr "mode" "TI,TI,TI,TI,V4SF,V2SF,V2SF")]) As far as I understand it, looking at the instruction and operand matches, this rule is used to combined two 64 bit values into one 128 bit xmm register. The relevant part in the RTL dump seems to be: (insn 401 108 109 (set (reg:DI 22 xmm1) (reg/f:DI 0 ax [124])) 89 {*movdi_1_rex64} (expr_list:REG_DEAD (reg/f:DI 0 ax [124]) (nil))) (insn:HI 109 401 402 (set (reg:V2DI 22 xmm1) (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 56 [0x38])) [52 D.11729+0 S8 A8]) (reg:DI 22 xmm1))) 1301 {*vec_concatv2di_rex} (nil)) The first instruction tells it to move %rax to %xmm1 (which contains the 64 bits of %xmm1 that should end up in the upper half of %xmm1), which just had 8 added to it in the previous step ("add $0x8,%rax" after %rax being loaded from rptr). The second instruction now is supposed to take %xmm1 and a memory operand (the "rptr" value without 8 added) and combine the two into %xmm1. As far as I understand the RTL template syntax, this definition handles seven cases at once, which the last one matching here, operands 0 to 2 being "x", "m" and "0", which I guess means "xmm register", "memory" and "same as 0th operand". So this is supposed to take the memory operand, %xmm1 and combine the two, i.e. taking the memory operand as lower half and the contents of %xmm1 as upper half and putting the result in %xmm1 again. Which is exactly what is supposed to happen here. For the second-to-last instruction (with the role flipped, i.e. the memory operand to land in the upper half), this would yield in a "movhps", which is ok. But for "movlps" it is not, since the lower half of %xmm1 is not moved to the upper half, which leads to the problem I see. My assumption would be that the seventh combination leading to emission of "movlps" is badly formulated. The "0" in the constraints seems to be interpreted by gcc that it's ok if the contents are in the lower half of the input register, whereas it seems that the constrain is should at least tell gcc that the contents of operand 2 should already be in the upper half. However, I have no idea how to tell gcc that. I could try and remove that 7th combination altogether to force gcc to find an alternative solution (?). Is it worth testing that? (i.e. going to build gcc and experiment...) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37101