movlps construct

christophe at saout dot de Wed, 13 Aug 2008 15:17:10 -0700


------- Comment #4 from christophe at saout dot de  2008-08-13 22:15 -------
Ok, I'm completely not in my game here, but after staring at rtl dumps and gcc
code for about two hours straight, things slowly start to make a litte sense...


(please tell me to shut up and forget about it if I am totally wrong about
this)

In my opinion the issue comes from here:

(define_insn "*vec_concatv2di_rex"
  [(set (match_operand:V2DI 0 "register_operand"     "=Y2,Yi,!Y2,Y2,x,x,x")
        (vec_concat:V2DI
          (match_operand:DI 1 "nonimmediate_operand" "  m,r ,*y ,0 ,0,0,m")
          (match_operand:DI 2 "vector_move_operand"  "  C,C ,C  ,Y2,x,m,0")))]
  "TARGET_64BIT"
  "@
   movq\t{%1, %0|%0, %1}
   movq\t{%1, %0|%0, %1}
   movq2dq\t{%1, %0|%0, %1}
   punpcklqdq\t{%2, %0|%0, %2}
   movlhps\t{%2, %0|%0, %2}
   movhps\t{%2, %0|%0, %2}
   movlps\t{%1, %0|%0, %1}"
  [(set_attr "type" "ssemov,ssemov,ssemov,sselog,ssemov,ssemov,ssemov")
   (set_attr "mode" "TI,TI,TI,TI,V4SF,V2SF,V2SF")])

As far as I understand it, looking at the instruction and operand matches,
this rule is used to combined two 64 bit values into one 128 bit xmm register.

The relevant part in the RTL dump seems to be:

(insn 401 108 109 (set (reg:DI 22 xmm1)
        (reg/f:DI 0 ax [124])) 89 {*movdi_1_rex64} (expr_list:REG_DEAD
(reg/f:DI 0 ax [124])
        (nil)))

(insn:HI 109 401 402 (set (reg:V2DI 22 xmm1)
        (vec_concat:V2DI (mem/c:DI (plus:DI (reg/f:DI 7 sp)
                    (const_int 56 [0x38])) [52 D.11729+0 S8 A8])
            (reg:DI 22 xmm1))) 1301 {*vec_concatv2di_rex} (nil))

The first instruction tells it to move %rax to %xmm1 (which contains the 64
bits of %xmm1 that should end up in the upper half of %xmm1), which just had 8
added to it in the previous step ("add $0x8,%rax" after %rax being loaded from
rptr).

The second instruction now is supposed to take %xmm1 and a memory operand (the
"rptr" value without 8 added) and combine the two into %xmm1.

As far as I understand the RTL template syntax, this definition handles seven
cases at once, which the last one matching here, operands 0 to 2 being "x", "m"
and "0", which I guess means "xmm register", "memory" and "same as 0th
operand".

So this is supposed to take the memory operand, %xmm1 and combine the two, i.e.
taking the memory operand as lower half and the contents of %xmm1 as upper half
and putting the result in %xmm1 again.

Which is exactly what is supposed to happen here. For the second-to-last
instruction (with the role flipped, i.e. the memory operand to land in the
upper half), this would yield in a "movhps", which is ok.

But for "movlps" it is not, since the lower half of %xmm1 is not moved to the
upper half, which leads to the problem I see.

My assumption would be that the seventh combination leading to emission of
"movlps" is badly formulated. The "0" in the constraints seems to be
interpreted by gcc that it's ok if the contents are in the lower half of the
input register, whereas it seems that the constrain is should at least tell gcc
that the contents of operand 2 should already be in the upper half. 

However, I have no idea how to tell gcc that.  I could try and remove that 7th
combination altogether to force gcc to find an alternative solution (?).

Is it worth testing that? (i.e. going to build gcc and experiment...)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37101

[Bug tree-optimization/37101] [4.2/4.3 Regression] wrong code: tree vectorizer omits bogus movq/movlps construct

Reply via email to