------- Additional Comments From uros at kss-loka dot si 2005-08-26 07:50 ------- The problem here is in the sse_concatv2sf pattern:
;; ??? In theory we can match memory for the MMX alternative, but allowing ;; nonimmediate_operand for operand 2 and *not* allowing memory for the SSE ;; alternatives pretty much forces the MMX alternative to be chosen. (define_insn "*sse_concatv2sf" [(set (match_operand:V2SF 0 "register_operand" "=x,x,*y,*y") (vec_concat:V2SF (match_operand:SF 1 "nonimmediate_operand" " 0,m, 0, m") (match_operand:SF 2 "vector_move_operand" " x,C,*y, C")))] and "vector_move_operand" operand constraint, defined as: ;; Return 1 when OP is operand acceptable for standard SSE move. (define_predicate "vector_move_operand" (ior (match_operand 0 "nonimmediate_operand") (match_operand 0 "const0_operand"))) Please note, that "vector_move_operand" allows memory operands, but register constraint doesn't. So, following pattern confuses reload: (insn:HI 63 62 64 3 (set (reg:V2SF 21 xmm0 [117]) (vec_concat:V2SF (mem:SF (plus:SI (plus:SI (reg/f:SI 68 [ ivtmp.71 ]) (reg:SI 88 [ D.1795 ])) (const_int -4 [0xfffffffc])) [2 S4 A32]) (mem:SF (plus:SI (plus:SI (reg/f:SI 68 [ ivtmp.71 ]) (reg:SI 89 [ D.1800 ])) (const_int -4 [0xfffffffc])) [2 S4 A32]))) 612 {*sse_concatv2sf} (nil) (BTW: "sse2_loadld" pattern could have the same problem, no "m" register constraint.) The immediate fix would be to define another operand constraint, similar to "vector_move_operand": ;; Same as above, but excluding memory operands. (define_predicate "vector_move_nomem_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "const0_operand"))) When operand 2 of sse_concatv2sf pattern is constrained with this new constraint, gcc is able to compile both testcases, and following result is produced (for both -01 and -02): ludcompd(): SSE2 code is used. 1.000000 4.000000 5.000000 3.000000 -2.800000 0.800000 -1.600000 -1.000000 -1.000000 0 2 1 3 0 5.000 6.000 10.000 78.000 0.800 -2.800 -7.000 -55.400 0.600 0.571 -1.000 -15.143 0.200 -0.286 1.000 -12.286 ludcompf(): SSE2 code is used. 1 2 1 0 3 5.000 6.000 10.000 78.000 0.800 -2.800 -7.000 -55.400 0.200 -0.286 -1.000 -27.429 0.600 0.571 1.000 12.286 Unfortunatelly, ludcompf() result (the second one) is wrong when -O1 or -O2 is used. It is correct without optimizations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23570