https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- With Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 273567) +++ gcc/config/i386/i386.md (working copy) @@ -17681,6 +17681,23 @@ (define_insn "<code><mode>3" (set_attr "type" "sseadd") (set_attr "mode" "<MODE>")]) +(define_expand "smaxsi3" + [(set (match_operand:SI 0 "register_operand") + (smax:SI + (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "register_operand")))] + "" +{ + rtx vop1 = gen_reg_rtx (V4SImode); + rtx vop2 = gen_reg_rtx (V4SImode); + emit_insn (gen_vec_setv4si_0 (vop1, CONST0_RTX (V4SImode), operands[1])); + emit_insn (gen_vec_setv4si_0 (vop2, CONST0_RTX (V4SImode), operands[2])); + rtx tem = gen_reg_rtx (V4SImode); + emit_insn (gen_smaxv4si3 (tem, vop1, vop2)); + emit_move_insn (operands[0], lowpart_subreg (SImode, tem, V4SImode)); + DONE; +}) + ;; These versions of the min/max patterns implement exactly the operations ;; min = (op1 < op2 ? op1 : op2) ;; max = (!(op1 < op2) ? op1 : op2) we generate .L3: addl (%rdx,%r8,4), %r9d movl (%rcx,%r8,4), %eax addl (%rsi,%r8,4), %eax vmovd %r9d, %xmm1 vmovd %eax, %xmm0 movq %r8, %rax vpmaxsd %xmm1, %xmm0, %xmm0 vinsertps $0xe, %xmm0, %xmm0, %xmm0 vpmaxsd %xmm2, %xmm0, %xmm0 vmovd %xmm0, 4(%rdi,%r8,4) vmovd %xmm0, %r9d incq %r8 cmpq %rax, %r10 jne .L3 so we manage to catch the store as well but somehow (insn:TI 35 27 37 4 (set (reg:V4SI 20 xmm0 [114]) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 20 xmm0 [107])) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (const_int 1 [0x1]))) 2740 {vec_setv4si_0} (nil)) fails to be elided. Maybe vec_setv4si_0 isn't the optimal representation choice. Ah, of course the zeros might end up invalidated by the earlier max... we can't say in RTL that we actually do not care about the upper bits - can we? Anyhow, while the above would fix the regression on Haswell we'd degrade on Zen and in more isolated cmov cases it's clearly not going to be a win as well.