------- Additional Comments From uros at kss-loka dot si  2005-08-26 07:50 
-------
The problem here is in the sse_concatv2sf pattern:

;; ??? In theory we can match memory for the MMX alternative, but allowing
;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE
;; alternatives pretty much forces the MMX alternative to be chosen.
(define_insn "*sse_concatv2sf"
  [(set (match_operand:V2SF 0 "register_operand"     "=x,x,*y,*y")
        (vec_concat:V2SF
          (match_operand:SF 1 "nonimmediate_operand" " 0,m, 0, m")
          (match_operand:SF 2 "vector_move_operand"  " x,C,*y, C")))]

and "vector_move_operand" operand constraint, defined as:

;; Return 1 when OP is operand acceptable for standard SSE move.
(define_predicate "vector_move_operand"
  (ior (match_operand 0 "nonimmediate_operand")
       (match_operand 0 "const0_operand")))

Please note, that "vector_move_operand" allows memory operands, but register 
constraint doesn't. So, following pattern confuses reload:

(insn:HI 63 62 64 3 (set (reg:V2SF 21 xmm0 [117])
        (vec_concat:V2SF (mem:SF (plus:SI (plus:SI (reg/f:SI 68 [ ivtmp.71 ])
                        (reg:SI 88 [ D.1795 ]))
                    (const_int -4 [0xfffffffc])) [2 S4 A32])
            (mem:SF (plus:SI (plus:SI (reg/f:SI 68 [ ivtmp.71 ])
                        (reg:SI 89 [ D.1800 ]))
                    (const_int -4 [0xfffffffc])) [2 S4 A32]))) 612 
{*sse_concatv2sf} (nil)

(BTW: "sse2_loadld" pattern could have the same problem, no "m" register 
constraint.)

The immediate fix would be to define another operand constraint, similar 
to "vector_move_operand":

;; Same as above, but excluding memory operands.
(define_predicate "vector_move_nomem_operand"
  (ior (match_operand 0 "register_operand")
       (match_operand 0 "const0_operand")))

When operand 2 of sse_concatv2sf pattern is constrained with this new 
constraint, gcc is able to compile both testcases, and following result is 
produced (for both -01 and -02):

ludcompd(): SSE2 code is used.
1.000000 4.000000 5.000000 3.000000 
-2.800000 0.800000 -1.600000 
-1.000000 -1.000000 
0
2 1 3 0
5.000 6.000 10.000 78.000
0.800 -2.800 -7.000 -55.400
0.600 0.571 -1.000 -15.143
0.200 -0.286 1.000 -12.286
ludcompf(): SSE2 code is used.
1
2 1 0 3
5.000 6.000 10.000 78.000
0.800 -2.800 -7.000 -55.400
0.200 -0.286 -1.000 -27.429
0.600 0.571 1.000 12.286

Unfortunatelly, ludcompf() result (the second one) is wrong when -O1 or -O2 is 
used. It is correct without optimizations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23570

Reply via email to