On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> The PR notes that we don't emit optimal code for C++ spaceship
> operator if the result is returned as an integer rather than the
> result just being compared against different values and different
> code executed based on that.
> So e.g. for
> template <typename T>
> auto foo (T x, T y) { return x <=> y; }
> for both floating point types, signed integer types and unsigned integer
> types.  auto in that case is std::strong_ordering or std::partial_ordering,
> which are fancy C++ abstractions around struct with signed char member
> which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
> ordering (but for -ffast-math 2 is never the case).
> I'm afraid functions like that are fairly common and unless they are
> inlined, we really need to map the comparison to those -1, 0, 1 or
> -1, 0, 1, 2 values.
>
> Now, for floating point spaceship I've in the past already added an
> optimization (with tree-ssa-math-opts.cc discovery and named optab, the
> optab only defined on x86 though right now), which ensures there is just
> a single comparison instruction and then just tests based on flags.
> Now, if we have code like:
>   auto a = x <=> y;
>   if (a == std::partial_ordering::less)
>     bar ();
>   else if (a == std::partial_ordering::greater)
>     baz ();
>   else if (a == std::partial_ordering::equivalent)
>     qux ();
>   else if (a == std::partial_ordering::unordered)
>     corge ();
> etc., that results in decent code generation, the spaceship named pattern
> on x86 optimizes for the jumps, so emits comparisons on the flags, followed
> by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
> well.  But if the result needs to be stored into an integer and just
> returned that way or there are no immediate jumps based on it (or turned
> into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
> doesn't do a good job for that, we end up with say
>         comiss  %xmm1, %xmm0
>         jp      .L4
>         seta    %al
>         movl    $0, %edx
>         leal    -1(%rax,%rax), %eax
>         cmove   %edx, %eax
>         ret
> .L4:
>         movl    $2, %eax
>         ret
> The jp is good, that is the unlikely case and can't be easily handled in
> straight line code due to the layout of the flags, but the rest uses cmov
> which often isn't a win and a weird math.
> With the patch below we can get instead
>         xorl    %eax, %eax
>         comiss  %xmm1, %xmm0
>         jp      .L2
>         seta    %al
>         sbbl    $0, %eax
>         ret
> .L2:
>         movl    $2, %eax
>         ret
>
> The patch changes the discovery in the generic code, by detecting if
> the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
> -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
> a new argument to .SPACESHIP ifn, so that the named pattern is told whether
> it should optimize for branches or for loading the result into a -1, 0, 1
> (, 2) integer.  Additionally, it doesn't detect just floating point <=>
> anymore, but also integer and unsigned integer, but in those cases only
> if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
> result in good code).
> The backend then can for those integer or unsigned integer <=>s return
> effectively (x > y) - (x < y) in a way that is efficient on the target
> (so for x86 with ensuring zero initialization first when needed before
> setcc; one for floating point and unsigned, where there is just one setcc
> and the second one optimized into sbb instruction, two for the signed int
> case).  So e.g. for signed int we now emit
>         xorl    %edx, %edx
>         xorl    %eax, %eax
>         cmpl    %esi, %edi
>         setl    %dl
>         setg    %al
>         subl    %edx, %eax
>         ret
> and for unsigned
>         xorl    %eax, %eax
>         cmpl    %esi, %edi
>         seta    %al
>         sbbb    $0, %al
>         ret
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, I wonder if other targets wouldn't benefit from defining the
> named optab too...
>
> 2024-10-04  Jakub Jelinek  <ja...@redhat.com>
>
>         PR middle-end/116896
>         * optabs.def (spaceship_optab): Use spaceship$a4 rather than
>         spaceship$a3.
>         * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
>         rather than 2, expand the last one, expect 4 operands of
>         spaceship_optab.
>         * tree-ssa-math-opts.cc: Include cfghooks.h.
>         (optimize_spaceship): Check if a single PHI is initialized to
>         -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
>         argument to .SPACESHIP and optimize away the comparisons,
>         otherwise pass 0.  Also check for integer comparisons rather than
>         floating point, in that case do it only if there is a single PHI
>         with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
>         if the <=> is signed, 2 if unsigned.
>         * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add
>         another rtx argument.
>         (ix86_expand_int_spaceship): Declare.
>         * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add
>         arg3 argument, if it is const0_rtx, expand like before, otherwise
>         emit optimized sequence for setting the result into a GPR.
>         (ix86_expand_int_spaceship): New function.
>         * config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code.
>         (setcc_si_slp): New define_expand.
>         (*setcc_si_slp): New define_insn_and_split.
>         (setcc + setcc + movzbl): New define_peephole2.
>         (spaceship<mode>3): Renamed to ...
>         (spaceship<mode>4): ... this.  Add an extra operand, pass it
>         to ix86_expand_fp_spaceship.
>         (spaceshipxf3): Renamed to ...
>         (spaceshipxf4): ... this.  Add an extra operand, pass it
>         to ix86_expand_fp_spaceship.
>         (spaceship<mode>4): New define_expand for SWI modes.
>         * doc/md.texi (spaceship@var{m}3): Renamed to ...
>         (spaceship@var{m}4): ... this.  Document the meaning of last
>         operand.
>
>         * g++.target/i386/pr116896-1.C: New test.
>         * g++.target/i386/pr116896-2.C: New test.

LGTM for the x86 part.

Thanks,
Uros.

>
> --- gcc/optabs.def.jj   2024-10-01 09:38:58.143960049 +0200
> +++ gcc/optabs.def      2024-10-02 13:52:53.352550358 +0200
> @@ -308,7 +308,7 @@ OPTAB_D (negv3_optab, "negv$I$a3")
>  OPTAB_D (uaddc5_optab, "uaddc$I$a5")
>  OPTAB_D (usubc5_optab, "usubc$I$a5")
>  OPTAB_D (addptr3_optab, "addptr$a3")
> -OPTAB_D (spaceship_optab, "spaceship$a3")
> +OPTAB_D (spaceship_optab, "spaceship$a4")
>
>  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
>  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> --- gcc/internal-fn.cc.jj       2024-10-01 09:38:41.482192804 +0200
> +++ gcc/internal-fn.cc  2024-10-02 13:52:53.354550330 +0200
> @@ -5107,6 +5107,7 @@ expand_SPACESHIP (internal_fn, gcall *st
>    tree lhs = gimple_call_lhs (stmt);
>    tree rhs1 = gimple_call_arg (stmt, 0);
>    tree rhs2 = gimple_call_arg (stmt, 1);
> +  tree rhs3 = gimple_call_arg (stmt, 2);
>    tree type = TREE_TYPE (rhs1);
>
>    do_pending_stack_adjust ();
> @@ -5114,13 +5115,15 @@ expand_SPACESHIP (internal_fn, gcall *st
>    rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>    rtx op1 = expand_normal (rhs1);
>    rtx op2 = expand_normal (rhs2);
> +  rtx op3 = expand_normal (rhs3);
>
> -  class expand_operand ops[3];
> +  class expand_operand ops[4];
>    create_call_lhs_operand (&ops[0], target, TYPE_MODE (TREE_TYPE (lhs)));
>    create_input_operand (&ops[1], op1, TYPE_MODE (type));
>    create_input_operand (&ops[2], op2, TYPE_MODE (type));
> +  create_input_operand (&ops[3], op3, TYPE_MODE (TREE_TYPE (rhs3)));
>    insn_code icode = optab_handler (spaceship_optab, TYPE_MODE (type));
> -  expand_insn (icode, 3, ops);
> +  expand_insn (icode, 4, ops);
>    assign_call_lhs (lhs, target, &ops[0]);
>  }
>
> --- gcc/tree-ssa-math-opts.cc.jj        2024-10-01 09:38:58.331957422 +0200
> +++ gcc/tree-ssa-math-opts.cc   2024-10-03 15:00:15.893473504 +0200
> @@ -117,6 +117,7 @@ along with GCC; see the file COPYING3.
>  #include "domwalk.h"
>  #include "tree-ssa-math-opts.h"
>  #include "dbgcnt.h"
> +#include "cfghooks.h"
>
>  /* This structure represents one basic block that either computes a
>     division, or is a common dominator for basic block that compute a
> @@ -5869,7 +5870,7 @@ convert_mult_to_highpart (gassign *stmt,
>     <bb 6> [local count: 1073741824]:
>     and turn it into:
>     <bb 2> [local count: 1073741824]:
> -   _1 = .SPACESHIP (a_2(D), b_3(D));
> +   _1 = .SPACESHIP (a_2(D), b_3(D), 0);
>     if (_1 == 0)
>       goto <bb 6>; [34.00%]
>     else
> @@ -5891,7 +5892,13 @@ convert_mult_to_highpart (gassign *stmt,
>
>     <bb 6> [local count: 1073741824]:
>     so that the backend can emit optimal comparison and
> -   conditional jump sequence.  */
> +   conditional jump sequence.  If the
> +   <bb 6> [local count: 1073741824]:
> +   above has a single PHI like:
> +   # _27 = PHI<0(2), -1(3), 2(4), 1(5)>
> +   then replace it with effectively
> +   _1 = .SPACESHIP (a_2(D), b_3(D), 1);
> +   _27 = _1;  */
>
>  static void
>  optimize_spaceship (gcond *stmt)
> @@ -5901,7 +5908,8 @@ optimize_spaceship (gcond *stmt)
>      return;
>    tree arg1 = gimple_cond_lhs (stmt);
>    tree arg2 = gimple_cond_rhs (stmt);
> -  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1))
> +  if ((!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1))
> +       && !INTEGRAL_TYPE_P (TREE_TYPE (arg1)))
>        || optab_handler (spaceship_optab,
>                         TYPE_MODE (TREE_TYPE (arg1))) == CODE_FOR_nothing
>        || operand_equal_p (arg1, arg2, 0))
> @@ -6013,12 +6021,105 @@ optimize_spaceship (gcond *stmt)
>         }
>      }
>
> -  gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 2, arg1, arg2);
> +  /* Check if there is a single bb into which all failed conditions
> +     jump to (perhaps through an empty block) and if it results in
> +     a single integral PHI which just sets it to -1, 0, 1, 2
> +     (or -1, 0, 1 when NaNs can't happen).  In that case use 1 rather
> +     than 0 as last .SPACESHIP argument to tell backends it might
> +     consider different code generation and just cast the result
> +     of .SPACESHIP to the PHI result.  */
> +  tree arg3 = integer_zero_node;
> +  edge e = EDGE_SUCC (bb0, 0);
> +  if (e->dest == bb1)
> +    e = EDGE_SUCC (bb0, 1);
> +  basic_block bbp = e->dest;
> +  gphi *phi = NULL;
> +  for (gphi_iterator psi = gsi_start_phis (bbp);
> +       !gsi_end_p (psi); gsi_next (&psi))
> +    {
> +      gphi *gp = psi.phi ();
> +      tree res = gimple_phi_result (gp);
> +
> +      if (phi != NULL
> +         || virtual_operand_p (res)
> +         || !INTEGRAL_TYPE_P (TREE_TYPE (res))
> +         || TYPE_PRECISION (TREE_TYPE (res)) < 2)
> +       {
> +         phi = NULL;
> +         break;
> +       }
> +      phi = gp;
> +    }
> +  if (phi
> +      && integer_zerop (gimple_phi_arg_def_from_edge (phi, e))
> +      && EDGE_COUNT (bbp->preds) == (HONOR_NANS (TREE_TYPE (arg1)) ? 4 : 3))
> +    {
> +      for (unsigned i = 0; phi && i < EDGE_COUNT (bbp->preds) - 1; ++i)
> +       {
> +         edge e3 = i == 0 ? e1 : i == 1 ? em1 : e2;
> +         if (e3->dest != bbp)
> +           {
> +             if (!empty_block_p (e3->dest)
> +                 || !single_succ_p (e3->dest)
> +                 || single_succ (e3->dest) != bbp)
> +               {
> +                 phi = NULL;
> +                 break;
> +               }
> +             e3 = single_succ_edge (e3->dest);
> +           }
> +         tree a = gimple_phi_arg_def_from_edge (phi, e3);
> +         if (TREE_CODE (a) != INTEGER_CST
> +             || (i == 0 && !integer_onep (a))
> +             || (i == 1 && !integer_all_onesp (a))
> +             || (i == 2 && wi::to_widest (a) != 2))
> +           {
> +             phi = NULL;
> +             break;
> +           }
> +       }
> +      if (phi)
> +       arg3 = build_int_cst (integer_type_node,
> +                             TYPE_UNSIGNED (TREE_TYPE (arg1)) ? 2 : 1);
> +    }
> +
> +  /* For integral <=> comparisons only use .SPACESHIP if it is turned
> +     into an integer (-1, 0, 1).  */
> +  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1)) && arg3 == integer_zero_node)
> +    return;
> +
> +  gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 3, arg1, arg2, 
> arg3);
>    tree lhs = make_ssa_name (integer_type_node);
>    gimple_call_set_lhs (gc, lhs);
>    gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>    gsi_insert_before (&gsi, gc, GSI_SAME_STMT);
>
> +  wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node));
> +  wide_int w2 = (HONOR_NANS (TREE_TYPE (arg1))
> +                ? wi::two (TYPE_PRECISION (integer_type_node))
> +                : wi::one (TYPE_PRECISION (integer_type_node)));
> +  int_range<1> vr (TREE_TYPE (lhs), wm1, w2);
> +  set_range_info (lhs, vr);
> +
> +  if (arg3 != integer_zero_node)
> +    {
> +      tree type = TREE_TYPE (gimple_phi_result (phi));
> +      if (!useless_type_conversion_p (type, integer_type_node))
> +       {
> +         tree tem = make_ssa_name (type);
> +         gimple *gcv = gimple_build_assign (tem, NOP_EXPR, lhs);
> +         gsi_insert_before (&gsi, gcv, GSI_SAME_STMT);
> +         lhs = tem;
> +       }
> +      SET_PHI_ARG_DEF_ON_EDGE (phi, e, lhs);
> +      gimple_cond_set_lhs (stmt, boolean_false_node);
> +      gimple_cond_set_rhs (stmt, boolean_false_node);
> +      gimple_cond_set_code (stmt, (e->flags & EDGE_TRUE_VALUE)
> +                                 ? EQ_EXPR : NE_EXPR);
> +      update_stmt (stmt);
> +      return;
> +    }
> +
>    gimple_cond_set_lhs (stmt, lhs);
>    gimple_cond_set_rhs (stmt, integer_zero_node);
>    update_stmt (stmt);
> @@ -6055,11 +6156,6 @@ optimize_spaceship (gcond *stmt)
>                             (e2->flags & EDGE_TRUE_VALUE) ? NE_EXPR : 
> EQ_EXPR);
>        update_stmt (cond);
>      }
> -
> -  wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node));
> -  wide_int w2 = wi::two (TYPE_PRECISION (integer_type_node));
> -  int_range<1> vr (TREE_TYPE (lhs), wm1, w2);
> -  set_range_info (lhs, vr);
>  }
>
>
> --- gcc/config/i386/i386-protos.h.jj    2024-10-01 09:38:41.237196225 +0200
> +++ gcc/config/i386/i386-protos.h       2024-10-02 18:45:12.807284696 +0200
> @@ -164,7 +164,8 @@ extern bool ix86_expand_fp_vec_cmp (rtx[
>  extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx);
>  extern void ix86_expand_sse_extend (rtx, rtx, bool);
>  extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
> -extern void ix86_expand_fp_spaceship (rtx, rtx, rtx);
> +extern void ix86_expand_fp_spaceship (rtx, rtx, rtx, rtx);
> +extern void ix86_expand_int_spaceship (rtx, rtx, rtx, rtx);
>  extern bool ix86_expand_int_addcc (rtx[]);
>  extern void ix86_expand_carry (rtx arg);
>  extern rtx_insn *ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
> --- gcc/config/i386/i386-expand.cc.jj   2024-10-01 09:38:41.181197008 +0200
> +++ gcc/config/i386/i386-expand.cc      2024-10-04 11:05:09.647022644 +0200
> @@ -3144,12 +3144,15 @@ ix86_expand_setcc (rtx dest, enum rtx_co
>     dest = op0 == op1 ? 0 : op0 < op1 ? -1 : op0 > op1 ? 1 : 2.  */
>
>  void
> -ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1)
> +ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1, rtx op2)
>  {
>    gcc_checking_assert (ix86_fp_comparison_strategy (GT) != IX86_FPCMP_ARITH);
> +  rtx zero = NULL_RTX;
> +  if (op2 != const0_rtx && TARGET_IEEE_FP && GET_MODE (dest) == SImode)
> +    zero = force_reg (SImode, const0_rtx);
>    rtx gt = ix86_expand_fp_compare (GT, op0, op1);
> -  rtx l0 = gen_label_rtx ();
> -  rtx l1 = gen_label_rtx ();
> +  rtx l0 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX;
> +  rtx l1 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX;
>    rtx l2 = TARGET_IEEE_FP ? gen_label_rtx () : NULL_RTX;
>    rtx lend = gen_label_rtx ();
>    rtx tmp;
> @@ -3163,23 +3166,68 @@ ix86_expand_fp_spaceship (rtx dest, rtx
>        jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
>        add_reg_br_prob_note (jmp, profile_probability:: very_unlikely ());
>      }
> -  rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode,
> -                          gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx);
> -  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq,
> -                             gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx);
> -  jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> -  add_reg_br_prob_note (jmp, profile_probability::unlikely ());
> -  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt,
> -                             gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx);
> -  jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> -  add_reg_br_prob_note (jmp, profile_probability::even ());
> -  emit_move_insn (dest, constm1_rtx);
> -  emit_jump (lend);
> -  emit_label (l0);
> -  emit_move_insn (dest, const0_rtx);
> -  emit_jump (lend);
> -  emit_label (l1);
> -  emit_move_insn (dest, const1_rtx);
> +  if (op2 == const0_rtx)
> +    {
> +      rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode,
> +                              gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx);
> +      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq,
> +                                 gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx);
> +      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> +      add_reg_br_prob_note (jmp, profile_probability::unlikely ());
> +      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt,
> +                                 gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx);
> +      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> +      add_reg_br_prob_note (jmp, profile_probability::even ());
> +      emit_move_insn (dest, constm1_rtx);
> +      emit_jump (lend);
> +      emit_label (l0);
> +      emit_move_insn (dest, const0_rtx);
> +      emit_jump (lend);
> +      emit_label (l1);
> +      emit_move_insn (dest, const1_rtx);
> +    }
> +  else
> +    {
> +      rtx lt_tmp = gen_reg_rtx (QImode);
> +      ix86_expand_setcc (lt_tmp, UNLT, gen_rtx_REG (CCFPmode, FLAGS_REG),
> +                        const0_rtx);
> +      if (GET_MODE (dest) != QImode)
> +       {
> +         tmp = gen_reg_rtx (GET_MODE (dest));
> +         emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> +                                                           lt_tmp)));
> +         lt_tmp = tmp;
> +       }
> +      rtx gt_tmp;
> +      if (zero)
> +       {
> +         /* If TARGET_IEEE_FP and dest has SImode, emit SImode clear
> +            before the floating point comparison and use setcc_si_slp
> +            pattern to hide it from the combiner, so that it doesn't
> +            undo it.  */
> +         tmp = ix86_expand_compare (GT, XEXP (gt, 0), const0_rtx);
> +         PUT_MODE (tmp, QImode);
> +         emit_insn (gen_setcc_si_slp (zero, tmp, zero));
> +         gt_tmp = zero;
> +       }
> +      else
> +       {
> +         gt_tmp = gen_reg_rtx (QImode);
> +         ix86_expand_setcc (gt_tmp, GT, XEXP (gt, 0), const0_rtx);
> +         if (GET_MODE (dest) != QImode)
> +           {
> +             tmp = gen_reg_rtx (GET_MODE (dest));
> +             emit_insn (gen_rtx_SET (tmp,
> +                                     gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> +                                                          gt_tmp)));
> +             gt_tmp = tmp;
> +           }
> +       }
> +      tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, 
> dest,
> +                                0, OPTAB_DIRECT);
> +      if (!rtx_equal_p (tmp, dest))
> +       emit_move_insn (dest, tmp);
> +    }
>    emit_jump (lend);
>    if (l2)
>      {
> @@ -3189,6 +3237,46 @@ ix86_expand_fp_spaceship (rtx dest, rtx
>    emit_label (lend);
>  }
>
> +/* Expand integral op0 <=> op1, i.e.
> +   dest = op0 == op1 ? 0 : op0 < op1 ? -1 : 1.  */
> +
> +void
> +ix86_expand_int_spaceship (rtx dest, rtx op0, rtx op1, rtx op2)
> +{
> +  gcc_assert (INTVAL (op2));
> +  /* Not using ix86_expand_int_compare here, so that it doesn't swap
> +     operands nor optimize CC mode - we need a mode usable for both
> +     LT and GT resp. LTU and GTU comparisons with the same unswapped
> +     operands.  */
> +  rtx flags = gen_rtx_REG (INTVAL (op2) == 1 ? CCGCmode : CCmode, FLAGS_REG);
> +  rtx tmp = gen_rtx_COMPARE (GET_MODE (flags), op0, op1);
> +  emit_insn (gen_rtx_SET (flags, tmp));
> +  rtx lt_tmp = gen_reg_rtx (QImode);
> +  ix86_expand_setcc (lt_tmp, INTVAL (op2) == 1 ? LT : LTU, flags,
> +                    const0_rtx);
> +  if (GET_MODE (dest) != QImode)
> +    {
> +      tmp = gen_reg_rtx (GET_MODE (dest));
> +      emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> +                                                       lt_tmp)));
> +      lt_tmp = tmp;
> +    }
> +  rtx gt_tmp = gen_reg_rtx (QImode);
> +  ix86_expand_setcc (gt_tmp, INTVAL (op2) == 1 ? GT : GTU, flags,
> +                    const0_rtx);
> +  if (GET_MODE (dest) != QImode)
> +    {
> +      tmp = gen_reg_rtx (GET_MODE (dest));
> +      emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> +                                                       gt_tmp)));
> +      gt_tmp = tmp;
> +    }
> +  tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, dest,
> +                            0, OPTAB_DIRECT);
> +  if (!rtx_equal_p (tmp, dest))
> +    emit_move_insn (dest, tmp);
> +}
> +
>  /* Expand comparison setting or clearing carry flag.  Return true when
>     successful and set pop for the operation.  */
>  static bool
> --- gcc/config/i386/i386.md.jj  2024-10-01 09:38:41.296195402 +0200
> +++ gcc/config/i386/i386.md     2024-10-03 14:32:24.661446577 +0200
> @@ -118,6 +118,7 @@ (define_c_enum "unspec" [
>    UNSPEC_PUSHFL
>    UNSPEC_POPFL
>    UNSPEC_OPTCOMX
> +  UNSPEC_SETCC_SI_SLP
>
>    ;; For SSE/MMX support:
>    UNSPEC_FIX_NOTRUNC
> @@ -19281,6 +19282,27 @@ (define_insn "*setcc_qi_slp"
>    [(set_attr "type" "setcc")
>     (set_attr "mode" "QI")])
>
> +(define_expand "setcc_si_slp"
> +  [(set (match_operand:SI 0 "register_operand")
> +       (unspec:SI
> +         [(match_operand:QI 1)
> +          (match_operand:SI 2 "register_operand")] UNSPEC_SETCC_SI_SLP))])
> +
> +(define_insn_and_split "*setcc_si_slp"
> +  [(set (match_operand:SI 0 "register_operand" "=q")
> +       (unspec:SI
> +         [(match_operator:QI 1 "ix86_comparison_operator"
> +            [(reg FLAGS_REG) (const_int 0)])
> +          (match_operand:SI 2 "register_operand" "0")] UNSPEC_SETCC_SI_SLP))]
> +  "ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0) (match_dup 2))
> +   (set (strict_low_part (match_dup 3)) (match_dup 1))]
> +{
> +  operands[3] = gen_lowpart (QImode, operands[0]);
> +})
> +
>  ;; In general it is not safe to assume too much about CCmode registers,
>  ;; so simplify-rtx stops when it sees a second one.  Under certain
>  ;; conditions this is safe on x86, so help combine not create
> @@ -19776,6 +19798,32 @@ (define_peephole2
>    operands[8] = gen_lowpart (QImode, operands[4]);
>    ix86_expand_clear (operands[4]);
>  })
> +
> +(define_peephole2
> +  [(set (match_operand 4 "flags_reg_operand") (match_operand 0))
> +   (set (strict_low_part (match_operand:QI 5 "register_operand"))
> +       (match_operator:QI 6 "ix86_comparison_operator"
> +         [(reg FLAGS_REG) (const_int 0)]))
> +   (set (match_operand:QI 1 "register_operand")
> +       (match_operator:QI 2 "ix86_comparison_operator"
> +         [(reg FLAGS_REG) (const_int 0)]))
> +   (set (match_operand 3 "any_QIreg_operand")
> +       (zero_extend (match_dup 1)))]
> +  "(peep2_reg_dead_p (4, operands[1])
> +    || operands_match_p (operands[1], operands[3]))
> +   && ! reg_overlap_mentioned_p (operands[3], operands[0])
> +   && ! reg_overlap_mentioned_p (operands[3], operands[5])
> +   && ! reg_overlap_mentioned_p (operands[1], operands[5])
> +   && peep2_regno_dead_p (0, FLAGS_REG)"
> +  [(set (match_dup 4) (match_dup 0))
> +   (set (strict_low_part (match_dup 5))
> +       (match_dup 6))
> +   (set (strict_low_part (match_dup 7))
> +       (match_dup 2))]
> +{
> +  operands[7] = gen_lowpart (QImode, operands[3]);
> +  ix86_expand_clear (operands[3]);
> +})
>
>  ;; Call instructions.
>
> @@ -29494,24 +29542,40 @@ (define_insn "hreset"
>     (set_attr "length" "4")])
>
>  ;; Spaceship optimization
> -(define_expand "spaceship<mode>3"
> +(define_expand "spaceship<mode>4"
>    [(match_operand:SI 0 "register_operand")
>     (match_operand:MODEF 1 "cmp_fp_expander_operand")
> -   (match_operand:MODEF 2 "cmp_fp_expander_operand")]
> +   (match_operand:MODEF 2 "cmp_fp_expander_operand")
> +   (match_operand:SI 3 "const_int_operand")]
>    "(TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH))
>     && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))"
>  {
> -  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]);
> +  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2],
> +                           operands[3]);
>    DONE;
>  })
>
> -(define_expand "spaceshipxf3"
> +(define_expand "spaceshipxf4"
>    [(match_operand:SI 0 "register_operand")
>     (match_operand:XF 1 "nonmemory_operand")
> -   (match_operand:XF 2 "nonmemory_operand")]
> +   (match_operand:XF 2 "nonmemory_operand")
> +   (match_operand:SI 3 "const_int_operand")]
>    "TARGET_80387 && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))"
>  {
> -  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]);
> +  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2],
> +                           operands[3]);
> +  DONE;
> +})
> +
> +(define_expand "spaceship<mode>4"
> +  [(match_operand:SI 0 "register_operand")
> +   (match_operand:SWI 1 "nonimmediate_operand")
> +   (match_operand:SWI 2 "<general_operand>")
> +   (match_operand:SI 3 "const_int_operand")]
> +  ""
> +{
> +  ix86_expand_int_spaceship (operands[0], operands[1], operands[2],
> +                            operands[3]);
>    DONE;
>  })
>
> --- gcc/doc/md.texi.jj  2024-10-01 09:38:58.035961557 +0200
> +++ gcc/doc/md.texi     2024-10-02 20:16:22.502329039 +0200
> @@ -8568,11 +8568,15 @@ inclusive and operand 1 exclusive.
>  If this pattern is not defined, a call to the library function
>  @code{__clear_cache} is used.
>
> -@cindex @code{spaceship@var{m}3} instruction pattern
> -@item @samp{spaceship@var{m}3}
> +@cindex @code{spaceship@var{m}4} instruction pattern
> +@item @samp{spaceship@var{m}4}
>  Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2
>  if operand 1 with mode @var{m} compares less than operand 2, equal to
>  operand 2, greater than operand 2 or is unordered with operand 2.
> +Operand 3 should be @code{const0_rtx} if the result is used in comparisons,
> +@code{const1_rtx} if the result is used as integer value and the comparison
> +is signed, @code{const2_rtx} if the result is used as integer value and
> +the comparison is unsigned.
>  @var{m} should be a scalar floating point mode.
>
>  This pattern is not allowed to @code{FAIL}.
> --- gcc/testsuite/g++.target/i386/pr116896-1.C.jj       2024-10-03 
> 14:40:27.071813336 +0200
> +++ gcc/testsuite/g++.target/i386/pr116896-1.C  2024-10-03 15:52:06.243819660 
> +0200
> @@ -0,0 +1,35 @@
> +// PR middle-end/116896
> +// { dg-do compile { target c++20 } }
> +// { dg-options "-O2 -masm=att -fno-stack-protector" }
> +// { dg-final { scan-assembler-times "\tjp\t" 1 } }
> +// { dg-final { scan-assembler-not "\tj\[^mp\]\[a-z\]*\t" } }
> +// { dg-final { scan-assembler-times "\tsbb\[bl\]\t\\\$0, " 3 } }
> +// { dg-final { scan-assembler-times "\tseta\t" 3 } }
> +// { dg-final { scan-assembler-times "\tsetg\t" 1 } }
> +// { dg-final { scan-assembler-times "\tsetl\t" 1 } }
> +
> +#include <compare>
> +
> +[[gnu::noipa]] auto
> +foo (float x, float y)
> +{
> +  return x <=> y;
> +}
> +
> +[[gnu::noipa, gnu::optimize ("fast-math")]] auto
> +bar (float x, float y)
> +{
> +  return x <=> y;
> +}
> +
> +[[gnu::noipa]] auto
> +baz (int x, int y)
> +{
> +  return x <=> y;
> +}
> +
> +[[gnu::noipa]] auto
> +qux (unsigned x, unsigned y)
> +{
> +  return x <=> y;
> +}
> --- gcc/testsuite/g++.target/i386/pr116896-2.C.jj       2024-10-03 
> 14:40:37.203674018 +0200
> +++ gcc/testsuite/g++.target/i386/pr116896-2.C  2024-10-04 10:55:07.468396073 
> +0200
> @@ -0,0 +1,41 @@
> +// PR middle-end/116896
> +// { dg-do run { target c++20 } }
> +// { dg-options "-O2" }
> +
> +#include "pr116896-1.C"
> +
> +[[gnu::noipa]] auto
> +corge (int x)
> +{
> +  return x <=> 0;
> +}
> +
> +[[gnu::noipa]] auto
> +garply (unsigned x)
> +{
> +  return x <=> 0;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (-1.0f, 1.0f) != std::partial_ordering::less
> +      || foo (1.0f, -1.0f) != std::partial_ordering::greater
> +      || foo (1.0f, 1.0f) != std::partial_ordering::equivalent
> +      || foo (__builtin_nanf (""), 1.0f) != std::partial_ordering::unordered
> +      || bar (-2.0f, 2.0f) != std::partial_ordering::less
> +      || bar (2.0f, -2.0f) != std::partial_ordering::greater
> +      || bar (-5.0f, -5.0f) != std::partial_ordering::equivalent
> +      || baz (-42, 42) != std::strong_ordering::less
> +      || baz (42, -42) != std::strong_ordering::greater
> +      || baz (42, 42) != std::strong_ordering::equal
> +      || qux (40, 42) != std::strong_ordering::less
> +      || qux (42, 40) != std::strong_ordering::greater
> +      || qux (40, 40) != std::strong_ordering::equal
> +      || corge (-15) != std::strong_ordering::less
> +      || corge (15) != std::strong_ordering::greater
> +      || corge (0) != std::strong_ordering::equal
> +      || garply (15) != std::strong_ordering::greater
> +      || garply (0) != std::strong_ordering::equal)
> +    __builtin_abort ();
> +}
>
>         Jakub
>

Reply via email to