> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Friday, September 6, 2024 5:19 PM
> To: Kong, Lingling <lingling.k...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Jeff Law <jeffreya...@gmail.com>; Richard Biener
> <richard.guent...@gmail.com>; Uros Bizjak <ubiz...@gmail.com>; Hongtao Liu
> <crazy...@gmail.com>; Jakub Jelinek <ja...@redhat.com>
> Subject: Re: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert
> pass
>
> "Kong, Lingling" <lingling.k...@intel.com> writes:
> > Hi,
> >
> > This version has added a new optab named 'cfmovcc'. The new optab is
> > used in the middle end to expand to cfcmov. And simplified my patch by
> > trying to generate the conditional faulting movcc in noce_try_cmove_arith
> function.
> >
> > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > We also tested spec with SDE and passed the runtime test.
> >
> > Ok for trunk?
> >
> >
> > APX CFCMOV[1] feature implements conditionally faulting which means If
> > the comparison is false, all memory faults are suppressed when load or
> > store a memory operand. Now we could load or store a memory operand
> > may trap or fault for conditional move.
> >
> > In middle-end, now we don't support a conditional move if we knew that
> > a load from A or B could trap or fault. To enable CFCMOV, we added a
> > new optab named cfmovcc.
> >
> > Conditional move suppress fault for condition mem store would not move
> > any arithmetic calculations. For condition mem load now just support a
> > conditional move one trap mem and one no trap and no mem cases.
>
> Sorry if this is going over old ground (I haven't read the earlier versions
> yet), but:
> instead of adding a new optab, could we treat CFCMOV as a scalar instance of
> maskload_optab? Robin is working on adding an "else" value for when the
> condition/mask is false. After that, it would seem to be a pretty close
> match to
> CFCMOV.
>
> One reason for preferring maskload is that it makes the load an explicit part
> of
> the interface. We could then potentially use it in gimple too, not just
> expand.
>
Yes, for conditional load is like a scalar instance of maskload_optab with
else operand.
I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But it
still after expand.
Now we don't have if-covert pass for scalar in gimple, do we have plan to do
that ?
Thanks,
Lingling
> Thanks,
> Richard
>
> >
> >
> > [1].https://www.intel.com/content/www/us/en/developer/articles/technic
> > al/advanced-performance-extensions-apx.html
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi: Add cfmovcc insn pattern explanation.
> > * ifcvt.cc (can_use_cmove_load_mem_notrap): New func
> > for conditional faulting movcc for load.
> > (can_use_cmove_store_mem_notrap): New func for conditional
> > faulting movcc for store.
> > (can_use_cfmovcc): New func for conditional faulting.
> > (noce_try_cmove_arith): Try to convert to conditional
> > faulting
> > movcc.
> > (noce_process_if_block): Ditto.
> > * optabs.cc (emit_conditional_move): Handle cfmovcc.
> > (emit_conditional_move_1): Ditto.
> > * optabs.def (OPTAB_D): New optab.
> > ---
> > gcc/doc/md.texi | 10 ++++
> > gcc/ifcvt.cc | 119 ++++++++++++++++++++++++++++++++++++++++++++----
> > gcc/optabs.cc | 14 +++++-
> > gcc/optabs.def | 1 +
> > 4 files changed, 132 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > a9259112251..5f563787c49 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating
> > point number and 0 otherwise. @var{m} is a scalar floating point
> > mode. Operand 0 has mode @code{SImode}, and operand 1 has mode
> @var{m}.
> > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item
> > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for
> > +conditional faulting, If the comparison is false, all memory faults
> > +are suppressed when load or store a memory operand.
> > +
> > +Conditionally move operand 2 or operand 3 into operand 0 according to
> > +the comparison in operand 1. If the comparison is true, operand 2 is
> > +moved into operand 0, otherwise operand 3 is moved.
> > +
> > @end table
> > @end ifset
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index
> > 6487574c514..59845390607 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct
> > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *, rtx,
> > enum
> rtx_code, rtx,
> > rtx, rtx, rtx, rtx =
> > NULL, rtx = NULL); static bool noce_try_cmove (struct noce_if_info *);
> > +static bool can_use_cmove_load_mem_notrap (rtx, rtx); static bool
> > +can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool); static bool
> > +can_use_cfmovcc (struct noce_if_info *);
> > static bool noce_try_cmove_arith (struct noce_if_info *); static rtx
> > noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> > static bool noce_try_minmax (struct noce_if_info *); @@ -2132,6
> > +2135,69 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
> > return true;
> > }
> > +/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
> > + or "if (test) x = a; else x = *b;" to conditional faulting movcc,
> > + i.e. x86 cfcmov, especially when load a or b may cause memmory
> > +faults. */
> > +
> > +static bool
> > +can_use_cmove_load_mem_notrap (rtx a, rtx b) {
> > + /* Just handle a conditional move from one trap MEM + other non_trap,
> > + non mem cases. */
> > + if (!(MEM_P (a) ^ MEM_P (b)))
> > + return false;
> > + bool a_trap = may_trap_or_fault_p (a);
> > + bool b_trap = may_trap_or_fault_p (b);
> > +
> > + if (!(a_trap ^ b_trap))
> > + return false;
> > + if (a_trap && !MEM_P (a))
> > + return false;
> > + if (b_trap && !MEM_P (b))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> > + conditional faulting movcc, i.e. x86 cfcmov, especially when store
> > + x may cause memmory faults and in else_bb x == b. */
> > +
> > +static bool
> > +can_use_cmove_store_mem_notrap (rtx x, rtx a, rtx b, bool a_simple) {
> > + gcc_assert (MEM_P (x));
> > +
> > + machine_mode x_mode = GET_MODE (x);
> > +
> > + if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> > + return false;
> > + if (!a_simple || !register_operand (a, x_mode))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Return TRUE if backend supports cfmovcc_optab, which suppressed
> memory
> > + faults when load or store a memory operand and the condition code
> > + evaluates to false. */
> > +
> > +static bool
> > +can_use_cfmovcc (struct noce_if_info *if_info) {
> > + rtx a = if_info->a;
> > + rtx b = if_info->b;
> > + rtx x = if_info->x;
> > +
> > + if (optab_handler (cfmovcc_optab, GET_MODE (x)) == CODE_FOR_nothing)
> > + return false;
> > +
> > + if (MEM_P (x))
> > + return can_use_cmove_store_mem_notrap (x, a, b,
> > +if_info->then_simple);
> > + else
> > + return can_use_cmove_load_mem_notrap (a, b); }
> > +
> > /* Try more complex cases involving conditional_move. */ static bool
> > @@ -2171,7 +2237,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> > /* ??? We could handle this if we knew that a load from A or B could
> > not trap or fault. This is also true if we've already loaded
> > from the address along the path from ENTRY. */
> > - else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > + else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
> > + return false;
> > + /* cfmovcc_optab implements conditionally faulting which means
> > + that if the condition code evaluates to false, all memory faults
> > + are suppressed when load or store a memory operand. Now we could
> > + load or store a memory operand may trap or fault for conditional
> > + move. */
> > + else if ((may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > + && !can_use_cfmovcc (if_info))
> > return false;
> > /* if (test) x = a + b; else x = c - d; @@ -2247,9 +2321,14 @@
> > noce_try_cmove_arith (struct noce_if_info *if_info)
> > /* If either operand is complex, load it into a register first.
> > The best way to do this is to copy the original insn. In this
> > way we preserve any clobbers etc that the insn may have had.
> > - This is of course not possible in the IS_MEM case. */
> > + This is of course not possible in the IS_MEM case.
> > + For load or store a operands may trap or fault, should not
> > + hoist the load or store, otherwise it unable to suppress memory
> > + fault, it just a normal arithmetic insn insteads of conditional
> > + faulting movcc. */
> > - if (! general_operand (a, GET_MODE (a)) || tmp_a)
> > + if (! may_trap_or_fault_p (a)
> > + && (! general_operand (a, GET_MODE (a)) || tmp_a))
> > {
> > if (is_mem)
> > @@ -2278,7 +2357,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> > }
> > }
> > - if (! general_operand (b, GET_MODE (b)) || tmp_b)
> > + if (! may_trap_or_fault_p (b)
> > + && (! general_operand (b, GET_MODE (b)) || tmp_b))
> > {
> > if (is_mem)
> > {
> > @@ -4210,12 +4290,31 @@ noce_process_if_block (struct noce_if_info
> *if_info)
> > }
> > if (!set_b && MEM_P (orig_x))
> > - /* We want to avoid store speculation to avoid cases like
> > - if (pthread_mutex_trylock(mutex))
> > - ++global_variable;
> > - Rather than go to much effort here, we rely on the SSA optimizers,
> > - which do a good enough job these days. */
> > - return false;
> > + {
> > + /* When target support conditional faulting movcc, i.e. x86 cfcmov,
> > + we could do conditonal mem store for "if (...) *x = a; else
> > skip"
> > + to cfmovcc_optab, which x may trap or fault. */
> > + if ((optab_handler (cfmovcc_optab, GET_MODE (orig_x))
> > + != CODE_FOR_nothing)
> > + && HAVE_conditional_move
> > + && may_trap_or_fault_p (orig_x)
> > + && register_operand (a, GET_MODE (orig_x)))
> > + {
> > + x = orig_x;
> > + if_info->x = x;
> > + if (noce_try_cmove_arith (if_info))
> > + goto success;
> > + else
> > + return false;
> > + }
> > + /* We want to avoid store speculation to avoid cases like
> > + if (pthread_mutex_trylock(mutex))
> > + ++global_variable;
> > + Rather than go to much effort here, we rely on the SSA
> > optimizers,
> > + which do a good enough job these days. */
> > + else
> > + return false;
> > + }
> > if (noce_try_move (if_info))
> > goto success;
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > 2bcb3f7b47a..181edf64f03 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -5034,6 +5034,7 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
> > rtx_insn *last;
> > enum insn_code icode;
> > enum rtx_code reversed;
> > + optab op = movcc_optab;
> > /* If the two source operands are identical, that's just a move.
> > */ @@ -5082,7 +5083,11 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
> > if (mode == VOIDmode)
> > mode = GET_MODE (op2);
> > - icode = direct_optab_handler (movcc_optab, mode);
> > + if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > + && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > + op = cfmovcc_optab;
> > +
> > + icode = direct_optab_handler (op, mode);
> > if (icode == CODE_FOR_nothing)
> > return NULL_RTX;
> > @@ -5194,6 +5199,7 @@ emit_conditional_move_1 (rtx target, rtx
> comparison,
> > rtx op2, rtx op3,
> > machine_mode mode) {
> > enum insn_code icode;
> > + optab op = movcc_optab;
> > if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> > return NULL_RTX;
> > @@ -5214,7 +5220,11 @@ emit_conditional_move_1 (rtx target, rtx
> comparison,
> > if (mode == VOIDmode)
> > mode = GET_MODE (op2);
> > - icode = direct_optab_handler (movcc_optab, mode);
> > + if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > + && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > + op = cfmovcc_optab;
> > +
> > + icode = direct_optab_handler (op, mode);
> > if (icode == CODE_FOR_nothing)
> > return NULL_RTX;
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > 58a939442bd..893e5c1c6c2 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -551,3 +551,4 @@ OPTAB_D (len_store_optab, "len_store_$a") OPTAB_D
> > (select_vl_optab, "select_vl$a") OPTAB_D (andn_optab, "andn$a3")
> > OPTAB_D (iorn_optab, "iorn$a3")
> > +OPTAB_D (cfmovcc_optab, "cfmov$acc")
> > --
> > 2.31.1