On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > The PR notes that we don't emit optimal code for C++ spaceship > operator if the result is returned as an integer rather than the > result just being compared against different values and different > code executed based on that. > So e.g. for > template <typename T> > auto foo (T x, T y) { return x <=> y; } > for both floating point types, signed integer types and unsigned integer > types. auto in that case is std::strong_ordering or std::partial_ordering, > which are fancy C++ abstractions around struct with signed char member > which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial > ordering (but for -ffast-math 2 is never the case). > I'm afraid functions like that are fairly common and unless they are > inlined, we really need to map the comparison to those -1, 0, 1 or > -1, 0, 1, 2 values. > > Now, for floating point spaceship I've in the past already added an > optimization (with tree-ssa-math-opts.cc discovery and named optab, the > optab only defined on x86 though right now), which ensures there is just > a single comparison instruction and then just tests based on flags. > Now, if we have code like: > auto a = x <=> y; > if (a == std::partial_ordering::less) > bar (); > else if (a == std::partial_ordering::greater) > baz (); > else if (a == std::partial_ordering::equivalent) > qux (); > else if (a == std::partial_ordering::unordered) > corge (); > etc., that results in decent code generation, the spaceship named pattern > on x86 optimizes for the jumps, so emits comparisons on the flags, followed > by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that > well. But if the result needs to be stored into an integer and just > returned that way or there are no immediate jumps based on it (or turned > into some non-standard integer values like -42, 0, 36, 75 etc.), then CE > doesn't do a good job for that, we end up with say > comiss %xmm1, %xmm0 > jp .L4 > seta %al > movl $0, %edx > leal -1(%rax,%rax), %eax > cmove %edx, %eax > ret > .L4: > movl $2, %eax > ret > The jp is good, that is the unlikely case and can't be easily handled in > straight line code due to the layout of the flags, but the rest uses cmov > which often isn't a win and a weird math. > With the patch below we can get instead > xorl %eax, %eax > comiss %xmm1, %xmm0 > jp .L2 > seta %al > sbbl $0, %eax > ret > .L2: > movl $2, %eax > ret > > The patch changes the discovery in the generic code, by detecting if > the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or > -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in > a new argument to .SPACESHIP ifn, so that the named pattern is told whether > it should optimize for branches or for loading the result into a -1, 0, 1 > (, 2) integer. Additionally, it doesn't detect just floating point <=> > anymore, but also integer and unsigned integer, but in those cases only > if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons > result in good code). > The backend then can for those integer or unsigned integer <=>s return > effectively (x > y) - (x < y) in a way that is efficient on the target > (so for x86 with ensuring zero initialization first when needed before > setcc; one for floating point and unsigned, where there is just one setcc > and the second one optimized into sbb instruction, two for the signed int > case). So e.g. for signed int we now emit > xorl %edx, %edx > xorl %eax, %eax > cmpl %esi, %edi > setl %dl > setg %al > subl %edx, %eax > ret > and for unsigned > xorl %eax, %eax > cmpl %esi, %edi > seta %al > sbbb $0, %al > ret > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Note, I wonder if other targets wouldn't benefit from defining the > named optab too... > > 2024-10-04 Jakub Jelinek <ja...@redhat.com> > > PR middle-end/116896 > * optabs.def (spaceship_optab): Use spaceship$a4 rather than > spaceship$a3. > * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments > rather than 2, expand the last one, expect 4 operands of > spaceship_optab. > * tree-ssa-math-opts.cc: Include cfghooks.h. > (optimize_spaceship): Check if a single PHI is initialized to > -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new) > argument to .SPACESHIP and optimize away the comparisons, > otherwise pass 0. Also check for integer comparisons rather than > floating point, in that case do it only if there is a single PHI > with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP > if the <=> is signed, 2 if unsigned. > * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add > another rtx argument. > (ix86_expand_int_spaceship): Declare. > * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add > arg3 argument, if it is const0_rtx, expand like before, otherwise > emit optimized sequence for setting the result into a GPR. > (ix86_expand_int_spaceship): New function. > * config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code. > (setcc_si_slp): New define_expand. > (*setcc_si_slp): New define_insn_and_split. > (setcc + setcc + movzbl): New define_peephole2. > (spaceship<mode>3): Renamed to ... > (spaceship<mode>4): ... this. Add an extra operand, pass it > to ix86_expand_fp_spaceship. > (spaceshipxf3): Renamed to ... > (spaceshipxf4): ... this. Add an extra operand, pass it > to ix86_expand_fp_spaceship. > (spaceship<mode>4): New define_expand for SWI modes. > * doc/md.texi (spaceship@var{m}3): Renamed to ... > (spaceship@var{m}4): ... this. Document the meaning of last > operand. > > * g++.target/i386/pr116896-1.C: New test. > * g++.target/i386/pr116896-2.C: New test.
LGTM for the x86 part. Thanks, Uros. > > --- gcc/optabs.def.jj 2024-10-01 09:38:58.143960049 +0200 > +++ gcc/optabs.def 2024-10-02 13:52:53.352550358 +0200 > @@ -308,7 +308,7 @@ OPTAB_D (negv3_optab, "negv$I$a3") > OPTAB_D (uaddc5_optab, "uaddc$I$a5") > OPTAB_D (usubc5_optab, "usubc$I$a5") > OPTAB_D (addptr3_optab, "addptr$a3") > -OPTAB_D (spaceship_optab, "spaceship$a3") > +OPTAB_D (spaceship_optab, "spaceship$a4") > > OPTAB_D (smul_highpart_optab, "smul$a3_highpart") > OPTAB_D (umul_highpart_optab, "umul$a3_highpart") > --- gcc/internal-fn.cc.jj 2024-10-01 09:38:41.482192804 +0200 > +++ gcc/internal-fn.cc 2024-10-02 13:52:53.354550330 +0200 > @@ -5107,6 +5107,7 @@ expand_SPACESHIP (internal_fn, gcall *st > tree lhs = gimple_call_lhs (stmt); > tree rhs1 = gimple_call_arg (stmt, 0); > tree rhs2 = gimple_call_arg (stmt, 1); > + tree rhs3 = gimple_call_arg (stmt, 2); > tree type = TREE_TYPE (rhs1); > > do_pending_stack_adjust (); > @@ -5114,13 +5115,15 @@ expand_SPACESHIP (internal_fn, gcall *st > rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > rtx op1 = expand_normal (rhs1); > rtx op2 = expand_normal (rhs2); > + rtx op3 = expand_normal (rhs3); > > - class expand_operand ops[3]; > + class expand_operand ops[4]; > create_call_lhs_operand (&ops[0], target, TYPE_MODE (TREE_TYPE (lhs))); > create_input_operand (&ops[1], op1, TYPE_MODE (type)); > create_input_operand (&ops[2], op2, TYPE_MODE (type)); > + create_input_operand (&ops[3], op3, TYPE_MODE (TREE_TYPE (rhs3))); > insn_code icode = optab_handler (spaceship_optab, TYPE_MODE (type)); > - expand_insn (icode, 3, ops); > + expand_insn (icode, 4, ops); > assign_call_lhs (lhs, target, &ops[0]); > } > > --- gcc/tree-ssa-math-opts.cc.jj 2024-10-01 09:38:58.331957422 +0200 > +++ gcc/tree-ssa-math-opts.cc 2024-10-03 15:00:15.893473504 +0200 > @@ -117,6 +117,7 @@ along with GCC; see the file COPYING3. > #include "domwalk.h" > #include "tree-ssa-math-opts.h" > #include "dbgcnt.h" > +#include "cfghooks.h" > > /* This structure represents one basic block that either computes a > division, or is a common dominator for basic block that compute a > @@ -5869,7 +5870,7 @@ convert_mult_to_highpart (gassign *stmt, > <bb 6> [local count: 1073741824]: > and turn it into: > <bb 2> [local count: 1073741824]: > - _1 = .SPACESHIP (a_2(D), b_3(D)); > + _1 = .SPACESHIP (a_2(D), b_3(D), 0); > if (_1 == 0) > goto <bb 6>; [34.00%] > else > @@ -5891,7 +5892,13 @@ convert_mult_to_highpart (gassign *stmt, > > <bb 6> [local count: 1073741824]: > so that the backend can emit optimal comparison and > - conditional jump sequence. */ > + conditional jump sequence. If the > + <bb 6> [local count: 1073741824]: > + above has a single PHI like: > + # _27 = PHI<0(2), -1(3), 2(4), 1(5)> > + then replace it with effectively > + _1 = .SPACESHIP (a_2(D), b_3(D), 1); > + _27 = _1; */ > > static void > optimize_spaceship (gcond *stmt) > @@ -5901,7 +5908,8 @@ optimize_spaceship (gcond *stmt) > return; > tree arg1 = gimple_cond_lhs (stmt); > tree arg2 = gimple_cond_rhs (stmt); > - if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1)) > + if ((!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1)) > + && !INTEGRAL_TYPE_P (TREE_TYPE (arg1))) > || optab_handler (spaceship_optab, > TYPE_MODE (TREE_TYPE (arg1))) == CODE_FOR_nothing > || operand_equal_p (arg1, arg2, 0)) > @@ -6013,12 +6021,105 @@ optimize_spaceship (gcond *stmt) > } > } > > - gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 2, arg1, arg2); > + /* Check if there is a single bb into which all failed conditions > + jump to (perhaps through an empty block) and if it results in > + a single integral PHI which just sets it to -1, 0, 1, 2 > + (or -1, 0, 1 when NaNs can't happen). In that case use 1 rather > + than 0 as last .SPACESHIP argument to tell backends it might > + consider different code generation and just cast the result > + of .SPACESHIP to the PHI result. */ > + tree arg3 = integer_zero_node; > + edge e = EDGE_SUCC (bb0, 0); > + if (e->dest == bb1) > + e = EDGE_SUCC (bb0, 1); > + basic_block bbp = e->dest; > + gphi *phi = NULL; > + for (gphi_iterator psi = gsi_start_phis (bbp); > + !gsi_end_p (psi); gsi_next (&psi)) > + { > + gphi *gp = psi.phi (); > + tree res = gimple_phi_result (gp); > + > + if (phi != NULL > + || virtual_operand_p (res) > + || !INTEGRAL_TYPE_P (TREE_TYPE (res)) > + || TYPE_PRECISION (TREE_TYPE (res)) < 2) > + { > + phi = NULL; > + break; > + } > + phi = gp; > + } > + if (phi > + && integer_zerop (gimple_phi_arg_def_from_edge (phi, e)) > + && EDGE_COUNT (bbp->preds) == (HONOR_NANS (TREE_TYPE (arg1)) ? 4 : 3)) > + { > + for (unsigned i = 0; phi && i < EDGE_COUNT (bbp->preds) - 1; ++i) > + { > + edge e3 = i == 0 ? e1 : i == 1 ? em1 : e2; > + if (e3->dest != bbp) > + { > + if (!empty_block_p (e3->dest) > + || !single_succ_p (e3->dest) > + || single_succ (e3->dest) != bbp) > + { > + phi = NULL; > + break; > + } > + e3 = single_succ_edge (e3->dest); > + } > + tree a = gimple_phi_arg_def_from_edge (phi, e3); > + if (TREE_CODE (a) != INTEGER_CST > + || (i == 0 && !integer_onep (a)) > + || (i == 1 && !integer_all_onesp (a)) > + || (i == 2 && wi::to_widest (a) != 2)) > + { > + phi = NULL; > + break; > + } > + } > + if (phi) > + arg3 = build_int_cst (integer_type_node, > + TYPE_UNSIGNED (TREE_TYPE (arg1)) ? 2 : 1); > + } > + > + /* For integral <=> comparisons only use .SPACESHIP if it is turned > + into an integer (-1, 0, 1). */ > + if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1)) && arg3 == integer_zero_node) > + return; > + > + gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 3, arg1, arg2, > arg3); > tree lhs = make_ssa_name (integer_type_node); > gimple_call_set_lhs (gc, lhs); > gimple_stmt_iterator gsi = gsi_for_stmt (stmt); > gsi_insert_before (&gsi, gc, GSI_SAME_STMT); > > + wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node)); > + wide_int w2 = (HONOR_NANS (TREE_TYPE (arg1)) > + ? wi::two (TYPE_PRECISION (integer_type_node)) > + : wi::one (TYPE_PRECISION (integer_type_node))); > + int_range<1> vr (TREE_TYPE (lhs), wm1, w2); > + set_range_info (lhs, vr); > + > + if (arg3 != integer_zero_node) > + { > + tree type = TREE_TYPE (gimple_phi_result (phi)); > + if (!useless_type_conversion_p (type, integer_type_node)) > + { > + tree tem = make_ssa_name (type); > + gimple *gcv = gimple_build_assign (tem, NOP_EXPR, lhs); > + gsi_insert_before (&gsi, gcv, GSI_SAME_STMT); > + lhs = tem; > + } > + SET_PHI_ARG_DEF_ON_EDGE (phi, e, lhs); > + gimple_cond_set_lhs (stmt, boolean_false_node); > + gimple_cond_set_rhs (stmt, boolean_false_node); > + gimple_cond_set_code (stmt, (e->flags & EDGE_TRUE_VALUE) > + ? EQ_EXPR : NE_EXPR); > + update_stmt (stmt); > + return; > + } > + > gimple_cond_set_lhs (stmt, lhs); > gimple_cond_set_rhs (stmt, integer_zero_node); > update_stmt (stmt); > @@ -6055,11 +6156,6 @@ optimize_spaceship (gcond *stmt) > (e2->flags & EDGE_TRUE_VALUE) ? NE_EXPR : > EQ_EXPR); > update_stmt (cond); > } > - > - wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node)); > - wide_int w2 = wi::two (TYPE_PRECISION (integer_type_node)); > - int_range<1> vr (TREE_TYPE (lhs), wm1, w2); > - set_range_info (lhs, vr); > } > > > --- gcc/config/i386/i386-protos.h.jj 2024-10-01 09:38:41.237196225 +0200 > +++ gcc/config/i386/i386-protos.h 2024-10-02 18:45:12.807284696 +0200 > @@ -164,7 +164,8 @@ extern bool ix86_expand_fp_vec_cmp (rtx[ > extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx); > extern void ix86_expand_sse_extend (rtx, rtx, bool); > extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool); > -extern void ix86_expand_fp_spaceship (rtx, rtx, rtx); > +extern void ix86_expand_fp_spaceship (rtx, rtx, rtx, rtx); > +extern void ix86_expand_int_spaceship (rtx, rtx, rtx, rtx); > extern bool ix86_expand_int_addcc (rtx[]); > extern void ix86_expand_carry (rtx arg); > extern rtx_insn *ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); > --- gcc/config/i386/i386-expand.cc.jj 2024-10-01 09:38:41.181197008 +0200 > +++ gcc/config/i386/i386-expand.cc 2024-10-04 11:05:09.647022644 +0200 > @@ -3144,12 +3144,15 @@ ix86_expand_setcc (rtx dest, enum rtx_co > dest = op0 == op1 ? 0 : op0 < op1 ? -1 : op0 > op1 ? 1 : 2. */ > > void > -ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1) > +ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1, rtx op2) > { > gcc_checking_assert (ix86_fp_comparison_strategy (GT) != IX86_FPCMP_ARITH); > + rtx zero = NULL_RTX; > + if (op2 != const0_rtx && TARGET_IEEE_FP && GET_MODE (dest) == SImode) > + zero = force_reg (SImode, const0_rtx); > rtx gt = ix86_expand_fp_compare (GT, op0, op1); > - rtx l0 = gen_label_rtx (); > - rtx l1 = gen_label_rtx (); > + rtx l0 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX; > + rtx l1 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX; > rtx l2 = TARGET_IEEE_FP ? gen_label_rtx () : NULL_RTX; > rtx lend = gen_label_rtx (); > rtx tmp; > @@ -3163,23 +3166,68 @@ ix86_expand_fp_spaceship (rtx dest, rtx > jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > add_reg_br_prob_note (jmp, profile_probability:: very_unlikely ()); > } > - rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode, > - gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx); > - tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq, > - gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx); > - jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > - add_reg_br_prob_note (jmp, profile_probability::unlikely ()); > - tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt, > - gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx); > - jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > - add_reg_br_prob_note (jmp, profile_probability::even ()); > - emit_move_insn (dest, constm1_rtx); > - emit_jump (lend); > - emit_label (l0); > - emit_move_insn (dest, const0_rtx); > - emit_jump (lend); > - emit_label (l1); > - emit_move_insn (dest, const1_rtx); > + if (op2 == const0_rtx) > + { > + rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode, > + gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx); > + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq, > + gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx); > + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > + add_reg_br_prob_note (jmp, profile_probability::unlikely ()); > + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt, > + gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx); > + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > + add_reg_br_prob_note (jmp, profile_probability::even ()); > + emit_move_insn (dest, constm1_rtx); > + emit_jump (lend); > + emit_label (l0); > + emit_move_insn (dest, const0_rtx); > + emit_jump (lend); > + emit_label (l1); > + emit_move_insn (dest, const1_rtx); > + } > + else > + { > + rtx lt_tmp = gen_reg_rtx (QImode); > + ix86_expand_setcc (lt_tmp, UNLT, gen_rtx_REG (CCFPmode, FLAGS_REG), > + const0_rtx); > + if (GET_MODE (dest) != QImode) > + { > + tmp = gen_reg_rtx (GET_MODE (dest)); > + emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest), > + lt_tmp))); > + lt_tmp = tmp; > + } > + rtx gt_tmp; > + if (zero) > + { > + /* If TARGET_IEEE_FP and dest has SImode, emit SImode clear > + before the floating point comparison and use setcc_si_slp > + pattern to hide it from the combiner, so that it doesn't > + undo it. */ > + tmp = ix86_expand_compare (GT, XEXP (gt, 0), const0_rtx); > + PUT_MODE (tmp, QImode); > + emit_insn (gen_setcc_si_slp (zero, tmp, zero)); > + gt_tmp = zero; > + } > + else > + { > + gt_tmp = gen_reg_rtx (QImode); > + ix86_expand_setcc (gt_tmp, GT, XEXP (gt, 0), const0_rtx); > + if (GET_MODE (dest) != QImode) > + { > + tmp = gen_reg_rtx (GET_MODE (dest)); > + emit_insn (gen_rtx_SET (tmp, > + gen_rtx_ZERO_EXTEND (GET_MODE (dest), > + gt_tmp))); > + gt_tmp = tmp; > + } > + } > + tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, > dest, > + 0, OPTAB_DIRECT); > + if (!rtx_equal_p (tmp, dest)) > + emit_move_insn (dest, tmp); > + } > emit_jump (lend); > if (l2) > { > @@ -3189,6 +3237,46 @@ ix86_expand_fp_spaceship (rtx dest, rtx > emit_label (lend); > } > > +/* Expand integral op0 <=> op1, i.e. > + dest = op0 == op1 ? 0 : op0 < op1 ? -1 : 1. */ > + > +void > +ix86_expand_int_spaceship (rtx dest, rtx op0, rtx op1, rtx op2) > +{ > + gcc_assert (INTVAL (op2)); > + /* Not using ix86_expand_int_compare here, so that it doesn't swap > + operands nor optimize CC mode - we need a mode usable for both > + LT and GT resp. LTU and GTU comparisons with the same unswapped > + operands. */ > + rtx flags = gen_rtx_REG (INTVAL (op2) == 1 ? CCGCmode : CCmode, FLAGS_REG); > + rtx tmp = gen_rtx_COMPARE (GET_MODE (flags), op0, op1); > + emit_insn (gen_rtx_SET (flags, tmp)); > + rtx lt_tmp = gen_reg_rtx (QImode); > + ix86_expand_setcc (lt_tmp, INTVAL (op2) == 1 ? LT : LTU, flags, > + const0_rtx); > + if (GET_MODE (dest) != QImode) > + { > + tmp = gen_reg_rtx (GET_MODE (dest)); > + emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest), > + lt_tmp))); > + lt_tmp = tmp; > + } > + rtx gt_tmp = gen_reg_rtx (QImode); > + ix86_expand_setcc (gt_tmp, INTVAL (op2) == 1 ? GT : GTU, flags, > + const0_rtx); > + if (GET_MODE (dest) != QImode) > + { > + tmp = gen_reg_rtx (GET_MODE (dest)); > + emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest), > + gt_tmp))); > + gt_tmp = tmp; > + } > + tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, dest, > + 0, OPTAB_DIRECT); > + if (!rtx_equal_p (tmp, dest)) > + emit_move_insn (dest, tmp); > +} > + > /* Expand comparison setting or clearing carry flag. Return true when > successful and set pop for the operation. */ > static bool > --- gcc/config/i386/i386.md.jj 2024-10-01 09:38:41.296195402 +0200 > +++ gcc/config/i386/i386.md 2024-10-03 14:32:24.661446577 +0200 > @@ -118,6 +118,7 @@ (define_c_enum "unspec" [ > UNSPEC_PUSHFL > UNSPEC_POPFL > UNSPEC_OPTCOMX > + UNSPEC_SETCC_SI_SLP > > ;; For SSE/MMX support: > UNSPEC_FIX_NOTRUNC > @@ -19281,6 +19282,27 @@ (define_insn "*setcc_qi_slp" > [(set_attr "type" "setcc") > (set_attr "mode" "QI")]) > > +(define_expand "setcc_si_slp" > + [(set (match_operand:SI 0 "register_operand") > + (unspec:SI > + [(match_operand:QI 1) > + (match_operand:SI 2 "register_operand")] UNSPEC_SETCC_SI_SLP))]) > + > +(define_insn_and_split "*setcc_si_slp" > + [(set (match_operand:SI 0 "register_operand" "=q") > + (unspec:SI > + [(match_operator:QI 1 "ix86_comparison_operator" > + [(reg FLAGS_REG) (const_int 0)]) > + (match_operand:SI 2 "register_operand" "0")] UNSPEC_SETCC_SI_SLP))] > + "ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(set (match_dup 0) (match_dup 2)) > + (set (strict_low_part (match_dup 3)) (match_dup 1))] > +{ > + operands[3] = gen_lowpart (QImode, operands[0]); > +}) > + > ;; In general it is not safe to assume too much about CCmode registers, > ;; so simplify-rtx stops when it sees a second one. Under certain > ;; conditions this is safe on x86, so help combine not create > @@ -19776,6 +19798,32 @@ (define_peephole2 > operands[8] = gen_lowpart (QImode, operands[4]); > ix86_expand_clear (operands[4]); > }) > + > +(define_peephole2 > + [(set (match_operand 4 "flags_reg_operand") (match_operand 0)) > + (set (strict_low_part (match_operand:QI 5 "register_operand")) > + (match_operator:QI 6 "ix86_comparison_operator" > + [(reg FLAGS_REG) (const_int 0)])) > + (set (match_operand:QI 1 "register_operand") > + (match_operator:QI 2 "ix86_comparison_operator" > + [(reg FLAGS_REG) (const_int 0)])) > + (set (match_operand 3 "any_QIreg_operand") > + (zero_extend (match_dup 1)))] > + "(peep2_reg_dead_p (4, operands[1]) > + || operands_match_p (operands[1], operands[3])) > + && ! reg_overlap_mentioned_p (operands[3], operands[0]) > + && ! reg_overlap_mentioned_p (operands[3], operands[5]) > + && ! reg_overlap_mentioned_p (operands[1], operands[5]) > + && peep2_regno_dead_p (0, FLAGS_REG)" > + [(set (match_dup 4) (match_dup 0)) > + (set (strict_low_part (match_dup 5)) > + (match_dup 6)) > + (set (strict_low_part (match_dup 7)) > + (match_dup 2))] > +{ > + operands[7] = gen_lowpart (QImode, operands[3]); > + ix86_expand_clear (operands[3]); > +}) > > ;; Call instructions. > > @@ -29494,24 +29542,40 @@ (define_insn "hreset" > (set_attr "length" "4")]) > > ;; Spaceship optimization > -(define_expand "spaceship<mode>3" > +(define_expand "spaceship<mode>4" > [(match_operand:SI 0 "register_operand") > (match_operand:MODEF 1 "cmp_fp_expander_operand") > - (match_operand:MODEF 2 "cmp_fp_expander_operand")] > + (match_operand:MODEF 2 "cmp_fp_expander_operand") > + (match_operand:SI 3 "const_int_operand")] > "(TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)) > && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))" > { > - ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]); > + ix86_expand_fp_spaceship (operands[0], operands[1], operands[2], > + operands[3]); > DONE; > }) > > -(define_expand "spaceshipxf3" > +(define_expand "spaceshipxf4" > [(match_operand:SI 0 "register_operand") > (match_operand:XF 1 "nonmemory_operand") > - (match_operand:XF 2 "nonmemory_operand")] > + (match_operand:XF 2 "nonmemory_operand") > + (match_operand:SI 3 "const_int_operand")] > "TARGET_80387 && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))" > { > - ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]); > + ix86_expand_fp_spaceship (operands[0], operands[1], operands[2], > + operands[3]); > + DONE; > +}) > + > +(define_expand "spaceship<mode>4" > + [(match_operand:SI 0 "register_operand") > + (match_operand:SWI 1 "nonimmediate_operand") > + (match_operand:SWI 2 "<general_operand>") > + (match_operand:SI 3 "const_int_operand")] > + "" > +{ > + ix86_expand_int_spaceship (operands[0], operands[1], operands[2], > + operands[3]); > DONE; > }) > > --- gcc/doc/md.texi.jj 2024-10-01 09:38:58.035961557 +0200 > +++ gcc/doc/md.texi 2024-10-02 20:16:22.502329039 +0200 > @@ -8568,11 +8568,15 @@ inclusive and operand 1 exclusive. > If this pattern is not defined, a call to the library function > @code{__clear_cache} is used. > > -@cindex @code{spaceship@var{m}3} instruction pattern > -@item @samp{spaceship@var{m}3} > +@cindex @code{spaceship@var{m}4} instruction pattern > +@item @samp{spaceship@var{m}4} > Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2 > if operand 1 with mode @var{m} compares less than operand 2, equal to > operand 2, greater than operand 2 or is unordered with operand 2. > +Operand 3 should be @code{const0_rtx} if the result is used in comparisons, > +@code{const1_rtx} if the result is used as integer value and the comparison > +is signed, @code{const2_rtx} if the result is used as integer value and > +the comparison is unsigned. > @var{m} should be a scalar floating point mode. > > This pattern is not allowed to @code{FAIL}. > --- gcc/testsuite/g++.target/i386/pr116896-1.C.jj 2024-10-03 > 14:40:27.071813336 +0200 > +++ gcc/testsuite/g++.target/i386/pr116896-1.C 2024-10-03 15:52:06.243819660 > +0200 > @@ -0,0 +1,35 @@ > +// PR middle-end/116896 > +// { dg-do compile { target c++20 } } > +// { dg-options "-O2 -masm=att -fno-stack-protector" } > +// { dg-final { scan-assembler-times "\tjp\t" 1 } } > +// { dg-final { scan-assembler-not "\tj\[^mp\]\[a-z\]*\t" } } > +// { dg-final { scan-assembler-times "\tsbb\[bl\]\t\\\$0, " 3 } } > +// { dg-final { scan-assembler-times "\tseta\t" 3 } } > +// { dg-final { scan-assembler-times "\tsetg\t" 1 } } > +// { dg-final { scan-assembler-times "\tsetl\t" 1 } } > + > +#include <compare> > + > +[[gnu::noipa]] auto > +foo (float x, float y) > +{ > + return x <=> y; > +} > + > +[[gnu::noipa, gnu::optimize ("fast-math")]] auto > +bar (float x, float y) > +{ > + return x <=> y; > +} > + > +[[gnu::noipa]] auto > +baz (int x, int y) > +{ > + return x <=> y; > +} > + > +[[gnu::noipa]] auto > +qux (unsigned x, unsigned y) > +{ > + return x <=> y; > +} > --- gcc/testsuite/g++.target/i386/pr116896-2.C.jj 2024-10-03 > 14:40:37.203674018 +0200 > +++ gcc/testsuite/g++.target/i386/pr116896-2.C 2024-10-04 10:55:07.468396073 > +0200 > @@ -0,0 +1,41 @@ > +// PR middle-end/116896 > +// { dg-do run { target c++20 } } > +// { dg-options "-O2" } > + > +#include "pr116896-1.C" > + > +[[gnu::noipa]] auto > +corge (int x) > +{ > + return x <=> 0; > +} > + > +[[gnu::noipa]] auto > +garply (unsigned x) > +{ > + return x <=> 0; > +} > + > +int > +main () > +{ > + if (foo (-1.0f, 1.0f) != std::partial_ordering::less > + || foo (1.0f, -1.0f) != std::partial_ordering::greater > + || foo (1.0f, 1.0f) != std::partial_ordering::equivalent > + || foo (__builtin_nanf (""), 1.0f) != std::partial_ordering::unordered > + || bar (-2.0f, 2.0f) != std::partial_ordering::less > + || bar (2.0f, -2.0f) != std::partial_ordering::greater > + || bar (-5.0f, -5.0f) != std::partial_ordering::equivalent > + || baz (-42, 42) != std::strong_ordering::less > + || baz (42, -42) != std::strong_ordering::greater > + || baz (42, 42) != std::strong_ordering::equal > + || qux (40, 42) != std::strong_ordering::less > + || qux (42, 40) != std::strong_ordering::greater > + || qux (40, 40) != std::strong_ordering::equal > + || corge (-15) != std::strong_ordering::less > + || corge (15) != std::strong_ordering::greater > + || corge (0) != std::strong_ordering::equal > + || garply (15) != std::strong_ordering::greater > + || garply (0) != std::strong_ordering::equal) > + __builtin_abort (); > +} > > Jakub >