Richard Biener <rguent...@suse.de> 于2025年1月20日周一 15:45写道:
>
> On Mon, 20 Jan 2025, Hongyu Wang wrote:
>
> > Thanks Richard for willing to review this part, it is true that the
> > try_cmove_arith logic adds quite a lot of special handling for
> > optimization, so I reduce the logic in emit_mask_load_store to just
> > generate most simple load/store that does not allow sources to be
> > swapped.
> >
> > Hi Jeff, would you help to take a review for the ifcvt changes? Thanks
> > in advance!
>
> I want to add that it feels a bit risky to do invasive ifcvt.cc changes
> at this point.  Is it possible to defer this to stage1?  I am not
> expecting a sudden uptick in APX code generation adoption ...
>

Ok, if you have concern we can postpone it to next stage1.

> Richard.
>
> > Richard Sandiford <richard.sandif...@arm.com> 于2025年1月16日周四 19:06写道:
> >
> > >
> > > Hongyu Wang <hongyu.w...@intel.com> writes:
> > > > From: Lingling Kong <lingling.k...@intel.com>
> > > >
> > > > Hi,
> > > >
> > > > Appreciated to Richard's review, the v5 patch contaings below change:
> > > >
> > > > 1. Separate the maskload/maskstore emit out from noce_emit_cmove, add
> > > > a new function emit_mask_load_store in optabs.cc.
> > > > 2. Follow the operand order of maskload and maskstore optab and takes
> > > > cond as predicate operand with VOIDmode.
> > > > 3. Cache may_trap_or_fault_p and correct the logic to ensure only one
> > > > of cmove source operand can be a may_trap_or_fault memory.
> > > >
> > > > Bootstrapped & regtested on x86-64-pclinux-gnu.
> > > >
> > > > OK for trunk?
> > > >
> > > > APX CFCMOV feature implements conditionally faulting which means
> > > > that all memory faults are suppressed when the condition code
> > > > evaluates to false and load or store a memory operand. Now we
> > > > could load or store a memory operand may trap or fault for
> > > > conditional move.
> > > >
> > > > In middle-end, now we don't support a conditional move if we knew
> > > > that a load from A or B could trap or fault. To enable CFCMOV, we
> > > > use mask_load and mask_store as a proxy for backend expander. The
> > > > predicate of mask_load/mask_store is recognized as comparison rtx
> > > > in the inital implementation.
> > > >
> > > > Conditional move suppress_fault for condition mem store would not
> > > > move any arithmetic calculations. For condition mem load now just
> > > > support a conditional move one trap mem and one no trap and no mem
> > > > cases.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * ifcvt.cc (can_use_mask_load_store):  New function to check
> > > >       wheter conditional fault load store .
> > > >       (noce_try_cmove_arith): Relax the condition for operand
> > > >       may_trap_or_fault check, expand with mask_load/mask_store optab
> > > >       for one of the cmove operand may trap or fault.
> > > >       (noce_process_if_block): Allow trap_or_fault dest for
> > > >       "if (...)" *x = a; else skip" scenario when mask_store optab is
> > > >       available.
> > > >       * optabs.h (emit_mask_load_store): New declaration.
> > > >       * optabs.cc (emit_mask_load_store): New function to emit
> > > >       conditional move with mask_load/mask_store optab.
> > >
> > > Thanks for the update.  This addresses the comments I had about
> > > the use of the maskload/store optabs in previous versions.
> > >
> > > I did make several attempts to review the patch beyond that, but I find
> > > it very difficult to understand the flow of noce_try_cmove_arith, and
> > > how all the various special cases fit together.  (Not your fault of
> > > course.)  So I think someone who knows ifcvt should take it from here.
> > >
> > > It would be nice if the internal implementation of emit_mask_load_store
> > > could share more code with other routines though.
> > >
> > > Thanks (and sorry),
> > > Richard
> > >
> > > > ---
> > > >  gcc/ifcvt.cc  | 110 ++++++++++++++++++++++++++++++++++++++++++--------
> > > >  gcc/optabs.cc | 103 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  gcc/optabs.h  |   3 ++
> > > >  3 files changed, 200 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > > > index cb5597bc171..51ac398aee1 100644
> > > > --- a/gcc/ifcvt.cc
> > > > +++ b/gcc/ifcvt.cc
> > > > @@ -778,6 +778,7 @@ static bool noce_try_store_flag_mask (struct 
> > > > noce_if_info *);
> > > >  static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, 
> > > > rtx,
> > > >                           rtx, rtx, rtx, rtx = NULL, rtx = NULL);
> > > >  static bool noce_try_cmove (struct noce_if_info *);
> > > > +static bool can_use_mask_load_store (struct noce_if_info *);
> > > >  static bool noce_try_cmove_arith (struct noce_if_info *);
> > > >  static rtx noce_get_alt_condition (struct noce_if_info *, rtx, 
> > > > rtx_insn **);
> > > >  static bool noce_try_minmax (struct noce_if_info *);
> > > > @@ -2132,6 +2133,39 @@ noce_emit_bb (rtx last_insn, basic_block bb, 
> > > > bool simple)
> > > >    return true;
> > > >  }
> > > >
> > > > +/* Return TRUE if backend supports scalar maskload_optab
> > > > +   or maskstore_optab, who suppresses memory faults when trying to
> > > > +   load or store a memory operand and the condition code evaluates
> > > > +   to false.
> > > > +   Currently the following forms
> > > > +       "if (test) *x = a; else skip;" --> mask_store
> > > > +       "if (test) x = *a; else x = b;" --> mask_load
> > > > +       "if (test) x = a; else x = *b;" --> mask_load
> > > > +   are supported.  */
> > > > +
> > > > +static bool
> > > > +can_use_mask_load_store (struct noce_if_info *if_info)
> > > > +{
> > > > +  rtx b = if_info->b;
> > > > +  rtx x = if_info->x;
> > > > +  rtx cond = if_info->cond;
> > > > +
> > > > +  if (MEM_P (x))
> > > > +    {
> > > > +      if (convert_optab_handler (maskstore_optab, GET_MODE (x),
> > > > +                              GET_MODE (cond)) == CODE_FOR_nothing)
> > > > +     return false;
> > > > +
> > > > +      if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> > > > +     return false;
> > > > +
> > > > +      return true;
> > > > +    }
> > > > +  else
> > > > +    return convert_optab_handler (maskload_optab, GET_MODE (x),
> > > > +                               GET_MODE (cond)) != CODE_FOR_nothing;
> > > > +}
> > > > +
> > > >  /* Try more complex cases involving conditional_move.  */
> > > >
> > > >  static bool
> > > > @@ -2151,6 +2185,9 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >    enum rtx_code code;
> > > >    rtx cond = if_info->cond;
> > > >    rtx_insn *ifcvt_seq;
> > > > +  bool a_may_trap_or_fault = may_trap_or_fault_p (a);
> > > > +  bool b_may_trap_or_fault = may_trap_or_fault_p (b);
> > > > +  bool use_mask_load_store = false;
> > > >
> > > >    /* A conditional move from two memory sources is equivalent to a
> > > >       conditional on their addresses followed by a load.  Don't do this
> > > > @@ -2167,11 +2204,22 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >        x = gen_reg_rtx (address_mode);
> > > >        is_mem = true;
> > > >      }
> > > > -
> > > > -  /* ??? We could handle this if we knew that a load from A or B could
> > > > -     not trap or fault.  This is also true if we've already loaded
> > > > -     from the address along the path from ENTRY.  */
> > > > -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > > > +  /* We could not handle the case that a and b may both trap or
> > > > +     fault.  */
> > > > +  else if (a_may_trap_or_fault && b_may_trap_or_fault)
> > > > +    return false;
> > > > +  /* Scalar maskload_optab/maskstore_optab implies conditionally
> > > > +     faulting, which means that if the condition mask evaluates to
> > > > +     false, all memory faults are suppressed when load or store a
> > > > +     memory operand. So if scalar_mask_load or store enabled, we could
> > > > +     do the conversion when one of a/b may trap or fault.  */
> > > > +  else if (((MEM_P (a) && a_may_trap_or_fault
> > > > +          && !b_may_trap_or_fault)
> > > > +         || (MEM_P (b) && b_may_trap_or_fault
> > > > +             && !a_may_trap_or_fault))
> > > > +        && can_use_mask_load_store (if_info))
> > > > +    use_mask_load_store = true;
> > > > +  else if (a_may_trap_or_fault || b_may_trap_or_fault)
> > > >      return false;
> > > >
> > > >    /* if (test) x = a + b; else x = c - d;
> > > > @@ -2212,6 +2260,7 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >         std::swap (insn_a, insn_b);
> > > >         std::swap (a_simple, b_simple);
> > > >         std::swap (then_bb, else_bb);
> > > > +       std::swap (a_may_trap_or_fault, b_may_trap_or_fault);
> > > >       }
> > > >      }
> > > >
> > > > @@ -2247,9 +2296,14 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >    /* If either operand is complex, load it into a register first.
> > > >       The best way to do this is to copy the original insn.  In this
> > > >       way we preserve any clobbers etc that the insn may have had.
> > > > -     This is of course not possible in the IS_MEM case.  */
> > > > +     This is of course not possible in the IS_MEM case.
> > > > +     For load or store a operands may trap or fault, should not
> > > > +     hoist the load or store, otherwise it unable to suppress memory
> > > > +     fault, it just a normal arithmetic insn insteads of conditional
> > > > +     faulting movcc.  */
> > > >
> > > > -  if (! general_operand (a, GET_MODE (a)) || tmp_a)
> > > > +  if (!a_may_trap_or_fault
> > > > +      && (! general_operand (a, GET_MODE (a)) || tmp_a))
> > > >      {
> > > >
> > > >        if (is_mem)
> > > > @@ -2278,7 +2332,8 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >       }
> > > >      }
> > > >
> > > > -  if (! general_operand (b, GET_MODE (b)) || tmp_b)
> > > > +  if (!b_may_trap_or_fault
> > > > +      && (! general_operand (b, GET_MODE (b)) || tmp_b))
> > > >      {
> > > >        if (is_mem)
> > > >       {
> > > > @@ -2356,8 +2411,12 @@ noce_try_cmove_arith (struct noce_if_info 
> > > > *if_info)
> > > >    else
> > > >      goto end_seq_and_fail;
> > > >
> > > > -  target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0), XEXP 
> > > > (cond, 1),
> > > > -                         a, b);
> > > > +  if (use_mask_load_store)
> > > > +    target = emit_mask_load_store (x, code, XEXP (cond, 0),
> > > > +                                XEXP (cond, 1), a, b);
> > > > +  else
> > > > +    target = noce_emit_cmove (if_info, x, code, XEXP (cond, 0),
> > > > +                           XEXP (cond, 1), a, b);
> > > >
> > > >    if (! target)
> > > >      goto end_seq_and_fail;
> > > > @@ -4210,12 +4269,31 @@ noce_process_if_block (struct noce_if_info 
> > > > *if_info)
> > > >      }
> > > >
> > > >    if (!set_b && MEM_P (orig_x))
> > > > -    /* We want to avoid store speculation to avoid cases like
> > > > -      if (pthread_mutex_trylock(mutex))
> > > > -        ++global_variable;
> > > > -       Rather than go to much effort here, we rely on the SSA 
> > > > optimizers,
> > > > -       which do a good enough job these days.  */
> > > > -    return false;
> > > > +    {
> > > > +      /* When target support scalar mask_store, i.e. x86 cfcmov,
> > > > +      we could do conditonal mem store for "if (...) *x = a; else skip"
> > > > +      where x may trap or fault.  */
> > > > +      if ((convert_optab_handler (maskstore_optab, GET_MODE (orig_x),
> > > > +                               GET_MODE (cond)) != CODE_FOR_nothing)
> > > > +       && HAVE_conditional_move
> > > > +       && may_trap_or_fault_p (orig_x)
> > > > +       && register_operand (a, GET_MODE (orig_x)))
> > > > +     {
> > > > +       x = orig_x;
> > > > +       if_info->x = x;
> > > > +       if (noce_try_cmove_arith (if_info))
> > > > +         goto success;
> > > > +       else
> > > > +         return false;
> > > > +     }
> > > > +      /* We want to avoid store speculation to avoid cases like
> > > > +        if (pthread_mutex_trylock(mutex))
> > > > +          ++global_variable;
> > > > +      Rather than go to much effort here, we rely on the SSA 
> > > > optimizers,
> > > > +      which do a good enough job these days.  */
> > > > +      else
> > > > +       return false;
> > > > +    }
> > > >
> > > >    if (noce_try_move (if_info))
> > > >      goto success;
> > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > > index 36f2e6af8b5..f862f74fb5e 100644
> > > > --- a/gcc/optabs.cc
> > > > +++ b/gcc/optabs.cc
> > > > @@ -5215,6 +5215,109 @@ emit_conditional_move_1 (rtx target, rtx 
> > > > comparison,
> > > >    return NULL_RTX;
> > > >  }
> > > >
> > > > +/* Emit a conditional move instruction using mask_load or mask_store 
> > > > optab,
> > > > +   which allows one of move source or dest be a may_trap_or_falut 
> > > > memory.
> > > > +   The input operands have similar meaning as emit_conditional_move, 
> > > > except
> > > > +   the pred parameter can be a mask predicate.  */
> > > > +
> > > > +rtx
> > > > +emit_mask_load_store (rtx target, enum rtx_code code, rtx cmp_a,
> > > > +                   rtx cmp_b, rtx vfalse, rtx vtrue, rtx pred)
> > > > +{
> > > > +  enum insn_code icode;
> > > > +
> > > > +  bool unsignedp = (code == LTU || code == GEU
> > > > +                 || code == LEU || code == GTU);
> > > > +
> > > > +  bool maskstore_p = MEM_P (target);
> > > > +  bool restore_stack = false;
> > > > +  saved_pending_stack_adjust save;
> > > > +  rtx_insn *last = get_last_insn ();
> > > > +
> > > > +  /* If pred doesn't exist, prepare compare insn using cmp_a and
> > > > +     cmp_b as predicate.  */
> > > > +  if (pred == NULL_RTX)
> > > > +    {
> > > > +      if (! general_operand (cmp_a, GET_MODE (cmp_a))
> > > > +       || ! general_operand (cmp_b, GET_MODE (cmp_b)))
> > > > +     return NULL_RTX;
> > > > +
> > > > +      if (swap_commutative_operands_p (cmp_a, cmp_b))
> > > > +     {
> > > > +       std::swap (cmp_a, cmp_b);
> > > > +       code = swap_condition (code);
> > > > +     }
> > > > +
> > > > +      /* get_condition will prefer to generate LT and GT even if the 
> > > > old
> > > > +      comparison was against zero, so undo that canonicalization here
> > > > +      since comparisons against zero are cheaper.  */
> > > > +
> > > > +      if (code == LT && cmp_b == const1_rtx)
> > > > +     code = LE, cmp_b = const0_rtx;
> > > > +      else if (code == GT && cmp_b == constm1_rtx)
> > > > +     code = GE, cmp_b = const0_rtx;
> > > > +
> > > > +      code = unsignedp ? unsigned_condition (code) : code;
> > > > +      rtx comparison = simplify_gen_relational (code, VOIDmode,
> > > > +                                             VOIDmode, cmp_a, cmp_b);
> > > > +
> > > > +      if (!COMPARISON_P (comparison))
> > > > +     return NULL_RTX;
> > > > +
> > > > +      save_pending_stack_adjust (&save);
> > > > +      do_pending_stack_adjust ();
> > > > +
> > > > +      machine_mode cmode = VOIDmode;
> > > > +      prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > > > +                     GET_CODE (comparison), NULL_RTX, unsignedp,
> > > > +                     OPTAB_WIDEN, &comparison, &cmode);
> > > > +
> > > > +      if (!comparison)
> > > > +     restore_stack = true;
> > > > +
> > > > +      pred = comparison;
> > > > +    }
> > > > +
> > > > +  if (pred)
> > > > +    {
> > > > +      machine_mode op_mode = GET_MODE (target),
> > > > +                cond_mode = GET_MODE (pred);
> > > > +
> > > > +      if (maskstore_p)
> > > > +     icode = convert_optab_handler (maskstore_optab, op_mode,
> > > > +                                    cond_mode);
> > > > +      else
> > > > +     icode = convert_optab_handler (maskload_optab, op_mode,
> > > > +                                    cond_mode);
> > > > +
> > > > +      if (icode != CODE_FOR_nothing)
> > > > +     {
> > > > +       class expand_operand ops[4];
> > > > +       int opnum = 3;
> > > > +
> > > > +       create_output_operand (&ops[0], target, op_mode);
> > > > +       create_input_operand (&ops[1], vtrue, op_mode);
> > > > +       create_fixed_operand (&ops[2], pred);
> > > > +
> > > > +       if (!maskstore_p)
> > > > +         {
> > > > +           create_input_operand (&ops[3], vfalse, op_mode);
> > > > +           opnum = 4;
> > > > +         }
> > > > +
> > > > +       if (maybe_expand_insn (icode, opnum, ops))
> > > > +         return target;
> > > > +     }
> > > > +    }
> > > > +
> > > > +  if (restore_stack)
> > > > +    {
> > > > +      delete_insns_since (last);
> > > > +      restore_pending_stack_adjust (&save);
> > > > +    }
> > > > +
> > > > +  return NULL_RTX;
> > > > +}
> > > >
> > > >  /* Emit a conditional negate or bitwise complement using the
> > > >     negcc or notcc optabs if available.  Return NULL_RTX if such 
> > > > operations
> > > > diff --git a/gcc/optabs.h b/gcc/optabs.h
> > > > index 23fa77be24e..6e211d9571a 100644
> > > > --- a/gcc/optabs.h
> > > > +++ b/gcc/optabs.h
> > > > @@ -293,6 +293,9 @@ extern void emit_indirect_jump (rtx);
> > > >  rtx emit_conditional_move (rtx, rtx_comparison, rtx, rtx, 
> > > > machine_mode, int);
> > > >  rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
> > > >
> > > > +/* Emit a mask_load or mask_store operation.  */
> > > > +rtx emit_mask_load_store (rtx, rtx_code, rtx, rtx, rtx, rtx, rtx = 
> > > > NULL);
> > > > +
> > > >  /* Emit a conditional negate or bitwise complement operation.  */
> > > >  rtx emit_conditional_neg_or_complement (rtx, rtx_code, machine_mode, 
> > > > rtx,
> > > >                                        rtx, rtx);
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to