On Wed, Aug 7, 2019 at 1:51 PM Richard Biener <rguent...@suse.de> wrote: > > On Wed, 7 Aug 2019, Richard Biener wrote: > > > On Mon, 5 Aug 2019, Uros Bizjak wrote: > > > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener <rguent...@suse.de> wrote: > > > > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI > > > > > > > > > > "TARGET_AVX512F"]) > > > > > > > > > > > > > > > > > > > > and then we need to split DImode for 32bits, too. > > > > > > > > > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for > > > > > > > > > DImode > > > > > > > > > condition, I'll provide _doubleword splitter later. > > > > > > > > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead? Or does the insn > > > > > > > > use %g0 etc. > > > > > > > > to force use of %zmmN? > > > > > > > > > > > > > > It generates V4SI mode, so - yes, AVX512VL. > > > > > > > > > > > > case SMAX: > > > > > > case SMIN: > > > > > > case UMAX: > > > > > > case UMIN: > > > > > > if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) > > > > > > || (mode == SImode && !TARGET_SSE4_1)) > > > > > > return false; > > > > > > > > > > > > so there's no way to use AVX512VL for 32bit? > > > > > > > > > > There is a way, but on 32bit targets, we need to split DImode > > > > > operation to a sequence of SImode operations for unconverted pattern. > > > > > This is of course doable, but somehow more complex than simply > > > > > emitting a DImode compare + DImode cmove, which is what current > > > > > splitter does. So, a follow-up task. > > > > > > > > Ah, OK. So for the above condition we can elide the !TARGET_64BIT > > > > check we just need to properly split if we enable the scalar minmax > > > > pattern for DImode on 32bits, the STV conversion would go fine. > > > > > > Yes, that is correct. > > > > So I tested the patch below (now with appropriate ChangeLog) on > > x86_64-unknown-linux-gnu. I've thrown it at SPEC CPU 2006 with > > the obvious hmmer improvement, now checking for off-noise results > > with a 3-run on those that may have one (with more than +-1 second > > differences in the 1-run). > > > > As-is the patch likely runs into the splitting issue for DImode > > on i?86 and the patch misses functional testcases. I'll do the > > hmmer loop with both DImode and SImode and testcases to trigger > > all pattern variants with the different ISAs we have. > > > > Some of the patch could be split out (the cost changes that are > > also effective for DImode for example). > > > > AFAICS we could go with only adding SImode avoiding the DImode > > splitting thing and this would solve the hmmer regression. > > I've additionally bootstrapped with --with-arch=nehalem which > reveals > > FAIL: gcc.target/i386/minmax-2.c scan-assembler test > FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp > > we emit cmp + cmov here now with -msse4.1 (as soon as the max > pattern is enabled I guess) > > Otherwise testing is clean, so I suppose this is the net effect > of just doing the SImode chains; I don't have AVX512 HW handily > available to really test the DImode path. > > Would you be fine to simplify the patch down to SImode chain handling?
Just leave DImode for a couple of days to see what HJ's autotesters reveal. I'd just disable DImode for 32bit targets for now, we know that splitters are missing. Some remarks below. Uros. > > Thanks, > Richard. > > > Thanks, > > Richard. > > > > 2019-08-07 Richard Biener <rguent...@suse.de> > > > > PR target/91154 > > * config/i386/i386-features.h (scalar_chain::scalar_chain): Add > > mode arguments. > > (scalar_chain::smode): New member. > > (scalar_chain::vmode): Likewise. > > (dimode_scalar_chain): Rename to... > > (general_scalar_chain): ... this. > > (general_scalar_chain::general_scalar_chain): Take mode arguments. > > (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain > > base with TImode and V1TImode. > > * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust. > > (general_scalar_chain::vector_const_cost): Adjust for SImode > > chains. > > (general_scalar_chain::compute_convert_gain): Likewise. Fix > > reg-reg move cost gain, use ix86_cost->sse_op cost and adjust > > scalar costs. Add {S,U}{MIN,MAX} support. Dump per-instruction > > gain if not zero. > > (general_scalar_chain::replace_with_subreg): Use vmode/smode. > > (general_scalar_chain::make_vector_copies): Likewise. Handle > > non-DImode chains appropriately. > > (general_scalar_chain::convert_reg): Likewise. > > (general_scalar_chain::convert_op): Likewise. > > (general_scalar_chain::convert_insn): Likewise. Add > > fatal_insn_not_found if the result is not recognized. > > (convertible_comparison_p): Pass in the scalar mode and use that. > > (general_scalar_to_vector_candidate_p): Likewise. Rename from > > dimode_scalar_to_vector_candidate_p. Add {S,U}{MIN,MAX} support. > > (scalar_to_vector_candidate_p): Remove by inlining into single > > caller. > > (general_remove_non_convertible_regs): Rename from > > dimode_remove_non_convertible_regs. > > (remove_non_convertible_regs): Remove by inlining into single caller. > > (convert_scalars_to_vector): Handle SImode and DImode chains > > in addition to TImode chains. > > * config/i386/i386.md (<maxmin><SWI48>3): New insn split after STV. > > > > Index: gcc/config/i386/i386-features.c > > =================================================================== > > --- gcc/config/i386/i386-features.c (revision 274111) > > +++ gcc/config/i386/i386-features.c (working copy) > > @@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0; > > > > /* Initialize new chain. */ > > > > -scalar_chain::scalar_chain () > > +scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode > > vmode_) > > { > > + smode = smode_; > > + vmode = vmode_; > > + > > chain_id = ++max_id; > > > > if (dump_file) > > @@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned ins > > conversion. */ > > > > void > > -dimode_scalar_chain::mark_dual_mode_def (df_ref def) > > +general_scalar_chain::mark_dual_mode_def (df_ref def) > > { > > gcc_assert (DF_REF_REG_DEF_P (def)); > > > > @@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidate > > && !HARD_REGISTER_P (SET_DEST (def_set))) > > bitmap_set_bit (defs, REGNO (SET_DEST (def_set))); > > > > + /* ??? The following is quadratic since analyze_register_chain > > + iterates over all refs to look for dual-mode regs. Instead this > > + should be done separately for all regs mentioned in the chain once. > > */ > > df_ref ref; > > df_ref def; > > for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) > > @@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, > > instead of using a scalar one. */ > > > > int > > -dimode_scalar_chain::vector_const_cost (rtx exp) > > +general_scalar_chain::vector_const_cost (rtx exp) > > { > > gcc_assert (CONST_INT_P (exp)); > > > > - if (standard_sse_constant_p (exp, V2DImode)) > > - return COSTS_N_INSNS (1); > > - return ix86_cost->sse_load[1]; > > + if (standard_sse_constant_p (exp, vmode)) > > + return ix86_cost->sse_op; > > + /* We have separate costs for SImode and DImode, use SImode costs > > + for smaller modes. */ > > + return ix86_cost->sse_load[smode == DImode ? 1 : 0]; > > } > > > > /* Compute a gain for chain conversion. */ > > > > int > > -dimode_scalar_chain::compute_convert_gain () > > +general_scalar_chain::compute_convert_gain () > > { > > bitmap_iterator bi; > > unsigned insn_uid; > > @@ -491,28 +499,37 @@ dimode_scalar_chain::compute_convert_gai > > if (dump_file) > > fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id); > > > > + /* SSE costs distinguish between SImode and DImode loads/stores, for > > + int costs factor in the number of GPRs involved. When supporting > > + smaller modes than SImode the int load/store costs need to be > > + adjusted as well. */ > > + unsigned sse_cost_idx = smode == DImode ? 1 : 0; > > + unsigned m = smode == DImode ? (TARGET_64BIT ? 1 : 2) : 1; > > + > > EXECUTE_IF_SET_IN_BITMAP (insns, 0, insn_uid, bi) > > { > > rtx_insn *insn = DF_INSN_UID_GET (insn_uid)->insn; > > rtx def_set = single_set (insn); > > rtx src = SET_SRC (def_set); > > rtx dst = SET_DEST (def_set); > > + int igain = 0; > > > > if (REG_P (src) && REG_P (dst)) > > - gain += COSTS_N_INSNS (2) - ix86_cost->xmm_move; > > + igain += 2 * m - ix86_cost->xmm_move; > > else if (REG_P (src) && MEM_P (dst)) > > - gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; > > + igain > > + += m * ix86_cost->int_store[2] - ix86_cost->sse_store[sse_cost_idx]; > > else if (MEM_P (src) && REG_P (dst)) > > - gain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1]; > > + igain += m * ix86_cost->int_load[2] - > > ix86_cost->sse_load[sse_cost_idx]; > > else if (GET_CODE (src) == ASHIFT > > || GET_CODE (src) == ASHIFTRT > > || GET_CODE (src) == LSHIFTRT) > > { > > if (CONST_INT_P (XEXP (src, 0))) > > - gain -= vector_const_cost (XEXP (src, 0)); > > - gain += ix86_cost->shift_const; > > + igain -= vector_const_cost (XEXP (src, 0)); > > + igain += m * ix86_cost->shift_const - ix86_cost->sse_op; > > if (INTVAL (XEXP (src, 1)) >= 32) > > - gain -= COSTS_N_INSNS (1); > > + igain -= COSTS_N_INSNS (1); > > } > > else if (GET_CODE (src) == PLUS > > || GET_CODE (src) == MINUS > > @@ -520,20 +537,31 @@ dimode_scalar_chain::compute_convert_gai > > || GET_CODE (src) == XOR > > || GET_CODE (src) == AND) > > { > > - gain += ix86_cost->add; > > + igain += m * ix86_cost->add - ix86_cost->sse_op; > > /* Additional gain for andnot for targets without BMI. */ > > if (GET_CODE (XEXP (src, 0)) == NOT > > && !TARGET_BMI) > > - gain += 2 * ix86_cost->add; > > + igain += m * ix86_cost->add; > > > > if (CONST_INT_P (XEXP (src, 0))) > > - gain -= vector_const_cost (XEXP (src, 0)); > > + igain -= vector_const_cost (XEXP (src, 0)); > > if (CONST_INT_P (XEXP (src, 1))) > > - gain -= vector_const_cost (XEXP (src, 1)); > > + igain -= vector_const_cost (XEXP (src, 1)); > > } > > else if (GET_CODE (src) == NEG > > || GET_CODE (src) == NOT) > > - gain += ix86_cost->add - COSTS_N_INSNS (1); > > + igain += m * ix86_cost->add - ix86_cost->sse_op; > > + else if (GET_CODE (src) == SMAX > > + || GET_CODE (src) == SMIN > > + || GET_CODE (src) == UMAX > > + || GET_CODE (src) == UMIN) > > + { > > + /* We do not have any conditional move cost, estimate it as a > > + reg-reg move. Comparisons are costed as adds. */ > > + igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); > > + /* Integer SSE ops are all costed the same. */ > > + igain -= ix86_cost->sse_op; > > + } > > else if (GET_CODE (src) == COMPARE) > > { > > /* Assume comparison cost is the same. */ > > @@ -541,18 +569,28 @@ dimode_scalar_chain::compute_convert_gai > > else if (CONST_INT_P (src)) > > { > > if (REG_P (dst)) > > - gain += COSTS_N_INSNS (2); > > + /* DImode can be immediate for TARGET_64BIT and SImode always. */ > > + igain += COSTS_N_INSNS (m); > > else if (MEM_P (dst)) > > - gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; > > - gain -= vector_const_cost (src); > > + igain += (m * ix86_cost->int_store[2] > > + - ix86_cost->sse_store[sse_cost_idx]); > > + igain -= vector_const_cost (src); > > } > > else > > gcc_unreachable (); > > + > > + if (igain != 0 && dump_file) > > + { > > + fprintf (dump_file, " Instruction gain %d for ", igain); > > + dump_insn_slim (dump_file, insn); > > + } > > + gain += igain; > > } > > > > if (dump_file) > > fprintf (dump_file, " Instruction conversion gain: %d\n", gain); > > > > + /* ??? What about integer to SSE? */ > > EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) > > cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; > > > > @@ -570,10 +608,10 @@ dimode_scalar_chain::compute_convert_gai > > /* Replace REG in X with a V2DI subreg of NEW_REG. */ > > > > rtx > > -dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) > > +general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) > > { > > if (x == reg) > > - return gen_rtx_SUBREG (V2DImode, new_reg, 0); > > + return gen_rtx_SUBREG (vmode, new_reg, 0); > > > > const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); > > int i, j; > > @@ -593,7 +631,7 @@ dimode_scalar_chain::replace_with_subreg > > /* Replace REG in INSN with a V2DI subreg of NEW_REG. */ > > > > void > > -dimode_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, > > +general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, > > rtx reg, rtx new_reg) > > { > > replace_with_subreg (single_set (insn), reg, new_reg); > > @@ -624,10 +662,10 @@ scalar_chain::emit_conversion_insns (rtx > > and replace its uses in a chain. */ > > > > void > > -dimode_scalar_chain::make_vector_copies (unsigned regno) > > +general_scalar_chain::make_vector_copies (unsigned regno) > > { > > rtx reg = regno_reg_rtx[regno]; > > - rtx vreg = gen_reg_rtx (DImode); > > + rtx vreg = gen_reg_rtx (smode); > > df_ref ref; > > > > for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) > > @@ -636,37 +674,47 @@ dimode_scalar_chain::make_vector_copies > > start_sequence (); > > if (!TARGET_INTER_UNIT_MOVES_TO_VEC) > > { > > - rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); > > - emit_move_insn (adjust_address (tmp, SImode, 0), > > - gen_rtx_SUBREG (SImode, reg, 0)); > > - emit_move_insn (adjust_address (tmp, SImode, 4), > > - gen_rtx_SUBREG (SImode, reg, 4)); > > + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP); > > + if (smode == DImode && !TARGET_64BIT) > > + { > > + emit_move_insn (adjust_address (tmp, SImode, 0), > > + gen_rtx_SUBREG (SImode, reg, 0)); > > + emit_move_insn (adjust_address (tmp, SImode, 4), > > + gen_rtx_SUBREG (SImode, reg, 4)); > > + } > > + else > > + emit_move_insn (tmp, reg); > > emit_move_insn (vreg, tmp); > > } > > - else if (TARGET_SSE4_1) > > + else if (!TARGET_64BIT && smode == DImode) > > { > > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), > > - CONST0_RTX (V4SImode), > > - gen_rtx_SUBREG (SImode, reg, 0))); > > - emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0), > > - gen_rtx_SUBREG (V4SImode, vreg, 0), > > - gen_rtx_SUBREG (SImode, reg, 4), > > - GEN_INT (2))); > > + if (TARGET_SSE4_1) > > + { > > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, > > 0), > > + CONST0_RTX (V4SImode), > > + gen_rtx_SUBREG (SImode, reg, 0))); > > + emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, > > 0), > > + gen_rtx_SUBREG (V4SImode, vreg, > > 0), > > + gen_rtx_SUBREG (SImode, reg, 4), > > + GEN_INT (2))); > > + } > > + else > > + { > > + rtx tmp = gen_reg_rtx (DImode); > > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, > > 0), > > + CONST0_RTX (V4SImode), > > + gen_rtx_SUBREG (SImode, reg, 0))); > > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0), > > + CONST0_RTX (V4SImode), > > + gen_rtx_SUBREG (SImode, reg, 4))); > > + emit_insn (gen_vec_interleave_lowv4si > > + (gen_rtx_SUBREG (V4SImode, vreg, 0), > > + gen_rtx_SUBREG (V4SImode, vreg, 0), > > + gen_rtx_SUBREG (V4SImode, tmp, 0))); > > + } > > } > > else > > - { > > - rtx tmp = gen_reg_rtx (DImode); > > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), > > - CONST0_RTX (V4SImode), > > - gen_rtx_SUBREG (SImode, reg, 0))); > > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0), > > - CONST0_RTX (V4SImode), > > - gen_rtx_SUBREG (SImode, reg, 4))); > > - emit_insn (gen_vec_interleave_lowv4si > > - (gen_rtx_SUBREG (V4SImode, vreg, 0), > > - gen_rtx_SUBREG (V4SImode, vreg, 0), > > - gen_rtx_SUBREG (V4SImode, tmp, 0))); > > - } > > + emit_move_insn (gen_lowpart (smode, vreg), reg); > > rtx_insn *seq = get_insns (); > > end_sequence (); > > rtx_insn *insn = DF_REF_INSN (ref); > > @@ -695,7 +743,7 @@ dimode_scalar_chain::make_vector_copies > > in case register is used in not convertible insn. */ > > > > void > > -dimode_scalar_chain::convert_reg (unsigned regno) > > +general_scalar_chain::convert_reg (unsigned regno) > > { > > bool scalar_copy = bitmap_bit_p (defs_conv, regno); > > rtx reg = regno_reg_rtx[regno]; > > @@ -707,7 +755,7 @@ dimode_scalar_chain::convert_reg (unsign > > bitmap_copy (conv, insns); > > > > if (scalar_copy) > > - scopy = gen_reg_rtx (DImode); > > + scopy = gen_reg_rtx (smode); > > > > for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) > > { > > @@ -727,40 +775,55 @@ dimode_scalar_chain::convert_reg (unsign > > start_sequence (); > > if (!TARGET_INTER_UNIT_MOVES_FROM_VEC) > > { > > - rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); > > + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP); > > emit_move_insn (tmp, reg); > > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > > - adjust_address (tmp, SImode, 0)); > > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > > - adjust_address (tmp, SImode, 4)); > > + if (!TARGET_64BIT && smode == DImode) > > + { > > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > > + adjust_address (tmp, SImode, 0)); > > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > > + adjust_address (tmp, SImode, 4)); > > + } > > + else > > + emit_move_insn (scopy, tmp); > > } > > - else if (TARGET_SSE4_1) > > + else if (!TARGET_64BIT && smode == DImode) > > { > > - rtx tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, > > const0_rtx)); > > - emit_insn > > - (gen_rtx_SET > > - (gen_rtx_SUBREG (SImode, scopy, 0), > > - gen_rtx_VEC_SELECT (SImode, > > - gen_rtx_SUBREG (V4SImode, reg, 0), > > tmp))); > > - > > - tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx)); > > - emit_insn > > - (gen_rtx_SET > > - (gen_rtx_SUBREG (SImode, scopy, 4), > > - gen_rtx_VEC_SELECT (SImode, > > - gen_rtx_SUBREG (V4SImode, reg, 0), > > tmp))); > > + if (TARGET_SSE4_1) > > + { > > + rtx tmp = gen_rtx_PARALLEL (VOIDmode, > > + gen_rtvec (1, const0_rtx)); > > + emit_insn > > + (gen_rtx_SET > > + (gen_rtx_SUBREG (SImode, scopy, 0), > > + gen_rtx_VEC_SELECT (SImode, > > + gen_rtx_SUBREG (V4SImode, reg, 0), > > + tmp))); > > + > > + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, > > const1_rtx)); > > + emit_insn > > + (gen_rtx_SET > > + (gen_rtx_SUBREG (SImode, scopy, 4), > > + gen_rtx_VEC_SELECT (SImode, > > + gen_rtx_SUBREG (V4SImode, reg, 0), > > + tmp))); > > + } > > + else > > + { > > + rtx vcopy = gen_reg_rtx (V2DImode); > > + emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0)); > > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > > + gen_rtx_SUBREG (SImode, vcopy, 0)); > > + emit_move_insn (vcopy, > > + gen_rtx_LSHIFTRT (V2DImode, > > + vcopy, GEN_INT (32))); > > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > > + gen_rtx_SUBREG (SImode, vcopy, 0)); > > + } > > } > > else > > - { > > - rtx vcopy = gen_reg_rtx (V2DImode); > > - emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0)); > > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > > - gen_rtx_SUBREG (SImode, vcopy, 0)); > > - emit_move_insn (vcopy, > > - gen_rtx_LSHIFTRT (V2DImode, vcopy, GEN_INT > > (32))); > > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > > - gen_rtx_SUBREG (SImode, vcopy, 0)); > > - } > > + emit_move_insn (scopy, reg); > > + > > rtx_insn *seq = get_insns (); > > end_sequence (); > > emit_conversion_insns (seq, insn); > > @@ -809,21 +872,21 @@ dimode_scalar_chain::convert_reg (unsign > > registers conversion. */ > > > > void > > -dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) > > +general_scalar_chain::convert_op (rtx *op, rtx_insn *insn) > > { > > *op = copy_rtx_if_shared (*op); > > > > if (GET_CODE (*op) == NOT) > > { > > convert_op (&XEXP (*op, 0), insn); > > - PUT_MODE (*op, V2DImode); > > + PUT_MODE (*op, vmode); > > } > > else if (MEM_P (*op)) > > { > > - rtx tmp = gen_reg_rtx (DImode); > > + rtx tmp = gen_reg_rtx (GET_MODE (*op)); > > > > emit_insn_before (gen_move_insn (tmp, *op), insn); > > - *op = gen_rtx_SUBREG (V2DImode, tmp, 0); > > + *op = gen_rtx_SUBREG (vmode, tmp, 0); > > > > if (dump_file) > > fprintf (dump_file, " Preloading operand for insn %d into r%d\n", > > @@ -841,24 +904,30 @@ dimode_scalar_chain::convert_op (rtx *op > > gcc_assert (!DF_REF_CHAIN (ref)); > > break; > > } > > - *op = gen_rtx_SUBREG (V2DImode, *op, 0); > > + *op = gen_rtx_SUBREG (vmode, *op, 0); > > } > > else if (CONST_INT_P (*op)) > > { > > rtx vec_cst; > > - rtx tmp = gen_rtx_SUBREG (V2DImode, gen_reg_rtx (DImode), 0); > > + rtx tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0); > > > > /* Prefer all ones vector in case of -1. */ > > if (constm1_operand (*op, GET_MODE (*op))) > > - vec_cst = CONSTM1_RTX (V2DImode); > > + vec_cst = CONSTM1_RTX (vmode); > > else > > - vec_cst = gen_rtx_CONST_VECTOR (V2DImode, > > - gen_rtvec (2, *op, const0_rtx)); > > + { > > + unsigned n = GET_MODE_NUNITS (vmode); > > + rtx *v = XALLOCAVEC (rtx, n); > > + v[0] = *op; > > + for (unsigned i = 1; i < n; ++i) > > + v[i] = const0_rtx; > > + vec_cst = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v)); > > + } > > > > - if (!standard_sse_constant_p (vec_cst, V2DImode)) > > + if (!standard_sse_constant_p (vec_cst, vmode)) > > { > > start_sequence (); > > - vec_cst = validize_mem (force_const_mem (V2DImode, vec_cst)); > > + vec_cst = validize_mem (force_const_mem (vmode, vec_cst)); > > rtx_insn *seq = get_insns (); > > end_sequence (); > > emit_insn_before (seq, insn); > > @@ -870,14 +939,14 @@ dimode_scalar_chain::convert_op (rtx *op > > else > > { > > gcc_assert (SUBREG_P (*op)); > > - gcc_assert (GET_MODE (*op) == V2DImode); > > + gcc_assert (GET_MODE (*op) == vmode); > > } > > } > > > > /* Convert INSN to vector mode. */ > > > > void > > -dimode_scalar_chain::convert_insn (rtx_insn *insn) > > +general_scalar_chain::convert_insn (rtx_insn *insn) > > { > > rtx def_set = single_set (insn); > > rtx src = SET_SRC (def_set); > > @@ -888,9 +957,9 @@ dimode_scalar_chain::convert_insn (rtx_i > > { > > /* There are no scalar integer instructions and therefore > > temporary register usage is required. */ > > - rtx tmp = gen_reg_rtx (DImode); > > + rtx tmp = gen_reg_rtx (GET_MODE (dst)); > > emit_conversion_insns (gen_move_insn (dst, tmp), insn); > > - dst = gen_rtx_SUBREG (V2DImode, tmp, 0); > > + dst = gen_rtx_SUBREG (vmode, tmp, 0); > > } > > > > switch (GET_CODE (src)) > > @@ -899,7 +968,7 @@ dimode_scalar_chain::convert_insn (rtx_i > > case ASHIFTRT: > > case LSHIFTRT: > > convert_op (&XEXP (src, 0), insn); > > - PUT_MODE (src, V2DImode); > > + PUT_MODE (src, vmode); > > break; > > > > case PLUS: > > @@ -907,25 +976,29 @@ dimode_scalar_chain::convert_insn (rtx_i > > case IOR: > > case XOR: > > case AND: > > + case SMAX: > > + case SMIN: > > + case UMAX: > > + case UMIN: > > convert_op (&XEXP (src, 0), insn); > > convert_op (&XEXP (src, 1), insn); > > - PUT_MODE (src, V2DImode); > > + PUT_MODE (src, vmode); > > break; > > > > case NEG: > > src = XEXP (src, 0); > > convert_op (&src, insn); > > - subreg = gen_reg_rtx (V2DImode); > > - emit_insn_before (gen_move_insn (subreg, CONST0_RTX (V2DImode)), > > insn); > > - src = gen_rtx_MINUS (V2DImode, subreg, src); > > + subreg = gen_reg_rtx (vmode); > > + emit_insn_before (gen_move_insn (subreg, CONST0_RTX (vmode)), insn); > > + src = gen_rtx_MINUS (vmode, subreg, src); > > break; > > > > case NOT: > > src = XEXP (src, 0); > > convert_op (&src, insn); > > - subreg = gen_reg_rtx (V2DImode); > > - emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (V2DImode)), > > insn); > > - src = gen_rtx_XOR (V2DImode, src, subreg); > > + subreg = gen_reg_rtx (vmode); > > + emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (vmode)), insn); > > + src = gen_rtx_XOR (vmode, src, subreg); > > break; > > > > case MEM: > > @@ -939,17 +1012,17 @@ dimode_scalar_chain::convert_insn (rtx_i > > break; > > > > case SUBREG: > > - gcc_assert (GET_MODE (src) == V2DImode); > > + gcc_assert (GET_MODE (src) == vmode); > > break; > > > > case COMPARE: > > src = SUBREG_REG (XEXP (XEXP (src, 0), 0)); > > > > - gcc_assert ((REG_P (src) && GET_MODE (src) == DImode) > > - || (SUBREG_P (src) && GET_MODE (src) == V2DImode)); > > + gcc_assert ((REG_P (src) && GET_MODE (src) == GET_MODE_INNER (vmode)) > > + || (SUBREG_P (src) && GET_MODE (src) == vmode)); > > > > if (REG_P (src)) > > - subreg = gen_rtx_SUBREG (V2DImode, src, 0); > > + subreg = gen_rtx_SUBREG (vmode, src, 0); > > else > > subreg = copy_rtx_if_shared (src); > > emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared > > (subreg), > > @@ -977,7 +1050,9 @@ dimode_scalar_chain::convert_insn (rtx_i > > PATTERN (insn) = def_set; > > > > INSN_CODE (insn) = -1; > > - recog_memoized (insn); > > + int patt = recog_memoized (insn); > > + if (patt == -1) > > + fatal_insn_not_found (insn); > > df_insn_rescan (insn); > > } > > > > @@ -1116,7 +1191,7 @@ timode_scalar_chain::convert_insn (rtx_i > > } > > > > void > > -dimode_scalar_chain::convert_registers () > > +general_scalar_chain::convert_registers () > > { > > bitmap_iterator bi; > > unsigned id; > > @@ -1186,7 +1261,7 @@ has_non_address_hard_reg (rtx_insn *insn > > (const_int 0 [0]))) */ > > > > static bool > > -convertible_comparison_p (rtx_insn *insn) > > +convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) > > { > > if (!TARGET_SSE4_1) > > return false; > > @@ -1219,12 +1294,12 @@ convertible_comparison_p (rtx_insn *insn > > > > if (!SUBREG_P (op1) > > || !SUBREG_P (op2) > > - || GET_MODE (op1) != SImode > > - || GET_MODE (op2) != SImode > > + || GET_MODE (op1) != mode > > + || GET_MODE (op2) != mode > > || ((SUBREG_BYTE (op1) != 0 > > - || SUBREG_BYTE (op2) != GET_MODE_SIZE (SImode)) > > + || SUBREG_BYTE (op2) != GET_MODE_SIZE (mode)) > > && (SUBREG_BYTE (op2) != 0 > > - || SUBREG_BYTE (op1) != GET_MODE_SIZE (SImode)))) > > + || SUBREG_BYTE (op1) != GET_MODE_SIZE (mode)))) > > return false; > > > > op1 = SUBREG_REG (op1); > > @@ -1232,7 +1307,7 @@ convertible_comparison_p (rtx_insn *insn > > > > if (op1 != op2 > > || !REG_P (op1) > > - || GET_MODE (op1) != DImode) > > + || GET_MODE (op1) != GET_MODE_WIDER_MODE (mode).else_blk ()) > > return false; > > > > return true; > > @@ -1241,7 +1316,7 @@ convertible_comparison_p (rtx_insn *insn > > /* The DImode version of scalar_to_vector_candidate_p. */ > > > > static bool > > -dimode_scalar_to_vector_candidate_p (rtx_insn *insn) > > +general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode > > mode) > > { > > rtx def_set = single_set (insn); > > > > @@ -1255,12 +1330,12 @@ dimode_scalar_to_vector_candidate_p (rtx > > rtx dst = SET_DEST (def_set); > > > > if (GET_CODE (src) == COMPARE) > > - return convertible_comparison_p (insn); > > + return convertible_comparison_p (insn, mode); > > > > /* We are interested in DImode promotion only. */ > > - if ((GET_MODE (src) != DImode > > + if ((GET_MODE (src) != mode > > && !CONST_INT_P (src)) > > - || GET_MODE (dst) != DImode) > > + || GET_MODE (dst) != mode) > > return false; > > > > if (!REG_P (dst) && !MEM_P (dst)) > > @@ -1280,6 +1355,15 @@ dimode_scalar_to_vector_candidate_p (rtx > > return false; > > break; > > > > + case SMAX: > > + case SMIN: > > + case UMAX: > > + case UMIN: > > + if ((mode == DImode && !TARGET_AVX512VL) Please enable only for TARGET64_BIT for now. > > + || (mode == SImode && !TARGET_SSE4_1)) > > + return false; > > + /* Fallthru. */ > > + > > case PLUS: > > case MINUS: > > case IOR: > > @@ -1290,7 +1374,7 @@ dimode_scalar_to_vector_candidate_p (rtx > > && !CONST_INT_P (XEXP (src, 1))) > > return false; > > > > - if (GET_MODE (XEXP (src, 1)) != DImode > > + if (GET_MODE (XEXP (src, 1)) != mode > > && !CONST_INT_P (XEXP (src, 1))) > > return false; > > break; > > @@ -1319,7 +1403,7 @@ dimode_scalar_to_vector_candidate_p (rtx > > || !REG_P (XEXP (XEXP (src, 0), 0)))) > > return false; > > > > - if (GET_MODE (XEXP (src, 0)) != DImode > > + if (GET_MODE (XEXP (src, 0)) != mode > > && !CONST_INT_P (XEXP (src, 0))) > > return false; > > > > @@ -1383,22 +1467,16 @@ timode_scalar_to_vector_candidate_p (rtx > > return false; > > } > > > > -/* Return 1 if INSN may be converted into vector > > - instruction. */ > > - > > -static bool > > -scalar_to_vector_candidate_p (rtx_insn *insn) > > -{ > > - if (TARGET_64BIT) > > - return timode_scalar_to_vector_candidate_p (insn); > > - else > > - return dimode_scalar_to_vector_candidate_p (insn); > > -} > > +/* For a given bitmap of insn UIDs scans all instruction and > > + remove insn from CANDIDATES in case it has both convertible > > + and not convertible definitions. > > > > -/* The DImode version of remove_non_convertible_regs. */ > > + All insns in a bitmap are conversion candidates according to > > + scalar_to_vector_candidate_p. Currently it implies all insns > > + are single_set. */ > > > > static void > > -dimode_remove_non_convertible_regs (bitmap candidates) > > +general_remove_non_convertible_regs (bitmap candidates) > > { > > bitmap_iterator bi; > > unsigned id; > > @@ -1553,23 +1631,6 @@ timode_remove_non_convertible_regs (bitm > > BITMAP_FREE (regs); > > } > > > > -/* For a given bitmap of insn UIDs scans all instruction and > > - remove insn from CANDIDATES in case it has both convertible > > - and not convertible definitions. > > - > > - All insns in a bitmap are conversion candidates according to > > - scalar_to_vector_candidate_p. Currently it implies all insns > > - are single_set. */ > > - > > -static void > > -remove_non_convertible_regs (bitmap candidates) > > -{ > > - if (TARGET_64BIT) > > - timode_remove_non_convertible_regs (candidates); > > - else > > - dimode_remove_non_convertible_regs (candidates); > > -} > > - > > /* Main STV pass function. Find and convert scalar > > instructions into vector mode when profitable. */ > > > > @@ -1577,11 +1638,14 @@ static unsigned int > > convert_scalars_to_vector () > > { > > basic_block bb; > > - bitmap candidates; > > int converted_insns = 0; > > > > bitmap_obstack_initialize (NULL); > > - candidates = BITMAP_ALLOC (NULL); > > + const machine_mode cand_mode[3] = { SImode, DImode, TImode }; > > + const machine_mode cand_vmode[3] = { V4SImode, V2DImode, V1TImode }; > > + bitmap_head candidates[3]; /* { SImode, DImode, TImode } */ > > + for (unsigned i = 0; i < 3; ++i) > > + bitmap_initialize (&candidates[i], &bitmap_default_obstack); > > > > calculate_dominance_info (CDI_DOMINATORS); > > df_set_flags (DF_DEFER_INSN_RESCAN); > > @@ -1597,51 +1661,73 @@ convert_scalars_to_vector () > > { > > rtx_insn *insn; > > FOR_BB_INSNS (bb, insn) > > - if (scalar_to_vector_candidate_p (insn)) > > + if (TARGET_64BIT > > + && timode_scalar_to_vector_candidate_p (insn)) > > { > > if (dump_file) > > - fprintf (dump_file, " insn %d is marked as a candidate\n", > > + fprintf (dump_file, " insn %d is marked as a TImode > > candidate\n", > > INSN_UID (insn)); > > > > - bitmap_set_bit (candidates, INSN_UID (insn)); > > + bitmap_set_bit (&candidates[2], INSN_UID (insn)); > > + } > > + else > > + { > > + /* Check {SI,DI}mode. */ > > + for (unsigned i = 0; i <= 1; ++i) > > + if (general_scalar_to_vector_candidate_p (insn, cand_mode[i])) > > + { > > + if (dump_file) > > + fprintf (dump_file, " insn %d is marked as a %s > > candidate\n", > > + INSN_UID (insn), i == 0 ? "SImode" : "DImode"); > > + > > + bitmap_set_bit (&candidates[i], INSN_UID (insn)); > > + break; > > + } > > } > > } > > > > - remove_non_convertible_regs (candidates); > > + if (TARGET_64BIT) > > + timode_remove_non_convertible_regs (&candidates[2]); > > + for (unsigned i = 0; i <= 1; ++i) > > + general_remove_non_convertible_regs (&candidates[i]); > > > > - if (bitmap_empty_p (candidates)) > > - if (dump_file) > > + for (unsigned i = 0; i <= 2; ++i) > > + if (!bitmap_empty_p (&candidates[i])) > > + break; > > + else if (i == 2 && dump_file) > > fprintf (dump_file, "There are no candidates for optimization.\n"); > > > > - while (!bitmap_empty_p (candidates)) > > - { > > - unsigned uid = bitmap_first_set_bit (candidates); > > - scalar_chain *chain; > > + for (unsigned i = 0; i <= 2; ++i) > > + while (!bitmap_empty_p (&candidates[i])) > > + { > > + unsigned uid = bitmap_first_set_bit (&candidates[i]); > > + scalar_chain *chain; > > > > - if (TARGET_64BIT) > > - chain = new timode_scalar_chain; > > - else > > - chain = new dimode_scalar_chain; > > + if (cand_mode[i] == TImode) > > + chain = new timode_scalar_chain; > > + else > > + chain = new general_scalar_chain (cand_mode[i], cand_vmode[i]); > > > > - /* Find instructions chain we want to convert to vector mode. > > - Check all uses and definitions to estimate all required > > - conversions. */ > > - chain->build (candidates, uid); > > + /* Find instructions chain we want to convert to vector mode. > > + Check all uses and definitions to estimate all required > > + conversions. */ > > + chain->build (&candidates[i], uid); > > > > - if (chain->compute_convert_gain () > 0) > > - converted_insns += chain->convert (); > > - else > > - if (dump_file) > > - fprintf (dump_file, "Chain #%d conversion is not profitable\n", > > - chain->chain_id); > > + if (chain->compute_convert_gain () > 0) > > + converted_insns += chain->convert (); > > + else > > + if (dump_file) > > + fprintf (dump_file, "Chain #%d conversion is not profitable\n", > > + chain->chain_id); > > > > - delete chain; > > - } > > + delete chain; > > + } > > > > if (dump_file) > > fprintf (dump_file, "Total insns converted: %d\n", converted_insns); > > > > - BITMAP_FREE (candidates); > > + for (unsigned i = 0; i <= 2; ++i) > > + bitmap_release (&candidates[i]); > > bitmap_obstack_release (NULL); > > df_process_deferred_rescans (); > > > > Index: gcc/config/i386/i386-features.h > > =================================================================== > > --- gcc/config/i386/i386-features.h (revision 274111) > > +++ gcc/config/i386/i386-features.h (working copy) > > @@ -127,11 +127,16 @@ namespace { > > class scalar_chain > > { > > public: > > - scalar_chain (); > > + scalar_chain (enum machine_mode, enum machine_mode); > > virtual ~scalar_chain (); > > > > static unsigned max_id; > > > > + /* Scalar mode. */ > > + enum machine_mode smode; > > + /* Vector mode. */ > > + enum machine_mode vmode; > > + > > /* ID of a chain. */ > > unsigned int chain_id; > > /* A queue of instructions to be included into a chain. */ > > @@ -159,9 +164,11 @@ class scalar_chain > > virtual void convert_registers () = 0; > > }; > > > > -class dimode_scalar_chain : public scalar_chain > > +class general_scalar_chain : public scalar_chain > > { > > public: > > + general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_) > > + : scalar_chain (smode_, vmode_) {} > > int compute_convert_gain (); > > private: > > void mark_dual_mode_def (df_ref def); > > @@ -178,6 +185,8 @@ class dimode_scalar_chain : public scala > > class timode_scalar_chain : public scalar_chain > > { > > public: > > + timode_scalar_chain () : scalar_chain (TImode, V1TImode) {} > > + > > /* Convert from TImode to V1TImode is always faster. */ > > int compute_convert_gain () { return 1; } > > > > Index: gcc/config/i386/i386.md > > =================================================================== > > --- gcc/config/i386/i386.md (revision 274111) > > +++ gcc/config/i386/i386.md (working copy) > > @@ -17721,6 +17721,30 @@ (define_peephole2 > > std::swap (operands[4], operands[5]); > > }) > > > > +;; min/max patterns You should use: (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI "TARGET_64BIT && TARGET_AVX512F"]) in the pattern below. Otherwise, middle-end detects and emits minmax patterns that have no chance of being converted and always split back to integer insns. > > +(define_code_attr maxmin_rel > > + [(smax "ge") (smin "le") (umax "geu") (umin "leu")]) > > +(define_code_attr maxmin_cmpmode > > + [(smax "CCGC") (smin "CCGC") (umax "CC") (umin "CC")]) > > + > > +(define_insn_and_split "<code><mode>3" > > + [(set (match_operand:SWI48 0 "register_operand") > > + (maxmin:SWI48 (match_operand:SWI48 1 "register_operand") > > + (match_operand:SWI48 2 "register_operand"))) > > + (clobber (reg:CC FLAGS_REG))] > > + "TARGET_STV && TARGET_SSE4_1 leave only TARGET_STV if MAXMIN_IMODE will be used. > > + && can_create_pseudo_p ()" > > + "#" > > + "&& 1" > > + [(set (reg:<maxmin_cmpmode> FLAGS_REG) > > + (compare:<maxmin_cmpmode> (match_dup 1)(match_dup 2))) > > + (set (match_dup 0) > > + (if_then_else:SWI48 > > + (<maxmin_rel> (reg:<maxmin_cmpmode> FLAGS_REG)(const_int 0)) > > + (match_dup 1) > > + (match_dup 2)))]) > > + > > ;; Conditional addition patterns > > (define_expand "add<mode>cc" > > [(match_operand:SWI 0 "register_operand") > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)