Hi,
  I was looking into why we don't produce fmls with a scalar register
as the last argument but I found a difference in how fnma<mode>4 is
described in RTL which I think is causing the missed optimization.
Look at the scalar version:


(define_insn "fnma<mode>4"
  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
        (fma:GPF_F16
          (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
          (match_operand:GPF_F16 2 "register_operand" "w")
          (match_operand:GPF_F16 3 "register_operand" "w")))]
  "TARGET_FLOAT"
  "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3"
  [(set_attr "type" "fmac<stype>")]
)

vs the vector version:
(define_insn "fnma<mode>4"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
        (fma:VHSDF
          (match_operand:VHSDF 1 "register_operand" "w")
          (neg:VHSDF
            (match_operand:VHSDF 2 "register_operand" "w"))
          (match_operand:VHSDF 3 "register_operand" "0")))]
  "TARGET_SIMD"
  "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
  [(set_attr "type" "neon_fp_mla_<stype><q>")]
)

Notice how the neg is a different location for both of them.  What is
the reason for that?

Thanks,
Andrew

Reply via email to