https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115683

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
For gcc.target/i386/pr88540.c we expand the mask producer as

(insn 12 11 13 (set (reg:V2DF 109)
        (lt:V2DF (reg:V2DF 101 [ vect__23.6 ])
            (reg:V2DF 98 [ vect__25.9 ]))) -1
     (nil))

(insn 13 12 14 (set (reg:V2DI 108)
        (subreg:V2DI (reg:V2DF 109) 0)) -1
     (nil))

(insn 14 13 15 (set (reg:V2DI 107 [ mask__26.10_21 ])
        (reg:V2DI 108)) -1
     (nil))

I think that we go though a named expander for the vec_cmp means we cannot
use TER tricks like we do with the scalar expansion which produces the
min from the x86 expander directly.

combine sees

   12: r109:V2DF=r105:V2DF<r106:V2DF
   15: r110:V2DF=r105:V2DF&r109:V2DF
      REG_DEAD r105:V2DF
   16: r111:V2DF=~r109:V2DF&r106:V2DF
      REG_DEAD r109:V2DF
      REG_DEAD r106:V2DF
   17: r100:V2DF=r111:V2DF|r110:V2DF

it tries 12, 15 -> 17 and 16, 15 -> 17 but I think the four-insn combinations
do not include this "diamond" variant.  It has I0, I1 -> I2, I2 -> I3
but this would be I0 -> I1, I0 -> I2, (I1, I2) -> I3, not sure if it were
to do that at all if we pass the insns to try_combine.

I don't see a good way for combine helpers, the only option would have been
to keep the blend (15, 16, 17) in a single insn to be split only after
combine.  With SSE 4.1 this is what happens (UNSPEC_BLENDV).

And of course catching this min/max form with a new optab during ISEL
or to be emitted (and costed) by the vectorizer directly.  It would be
quite special, select_lt and select_gt maybe, eventually merged
select with a compare op like we have for vec_cmp to specify the comparison
code.  select (A code B) would then be A code B ? A : B.

Reply via email to