> -----Original Message----- > From: Richard Biener <richard.guent...@gmail.com> > Sent: Wednesday, May 8, 2019 3:35 PM > To: JiangNing OS <jiangn...@os.amperecomputing.com> > Cc: gcc-patches@gcc.gnu.org; Richard Biener <rguent...@suse.de>; > pins...@gcc.gnu.org > Subject: Re: Fixing ifcvt issue as exposed by BZ89430 > > On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS > <jiangn...@os.amperecomputing.com> wrote: > > > > To solve BZ89430 the followings are needed, > > > > (1) The code below in noce_try_cmove_arith needs to be fixed. > > > > /* ??? We could handle this if we knew that a load from A or B could > > not trap or fault. This is also true if we've already loaded > > from the address along the path from ENTRY. */ > > else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b)) > > return FALSE; > > > > Finding dominating memory access could help to decide whether the > memory access following the conditional move is valid or not. > > * If there is a dominating memory write to the same memory address in > test_bb as the one from x=a, it is always safe. > > * When the dominating memory access to the same memory address in > test_bb is a read, for the load out of x=a, it is always safe. For the store > out > of x=a, if the memory is on stack, it is still safe. > > > > Besides, there is a bug around rearranging the then_bb and else_bb layout, > which needs to be fixed. > > > > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html > is an overkill. If the write target following conditional move is a memory > access, it exits shortly. > > > > if (!set_b && MEM_P (orig_x)) > > /* We want to avoid store speculation to avoid cases like > > if (pthread_mutex_trylock(mutex)) > > ++global_variable; > > Rather than go to much effort here, we rely on the SSA optimizers, > > which do a good enough job these days. */ > > return FALSE; > > > > It looks a bit hard for compiler to know the program semantic is > > related to mutex and avoid store speculation. Without removing this > > overkill, the fix mentioned by (1) would not work. Any idea? An > > alternative solution is to detect the following pattern > > conservatively, > > But it's important to not break this. Note we do have --param allow-store- > data-races which the user can use to override this. > Note that accesses to the stack can obviously not cause store speculation if > its address didn't escape. But that's probably what is refered to by "rely on > the SSA optimizers".
Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization (PR rtl-optimization/89430)" to detect the access to stack two months ago. The SSA Optimization called "tree-if-conv" in middle-end can't really cover this case. The key part of my change is like, - /* We want to avoid store speculation to avoid cases like - if (pthread_mutex_trylock(mutex)) - ++global_variable; - Rather than go to much effort here, we rely on the SSA optimizers, - which do a good enough job these days. */ - return FALSE; + { + /* We want to avoid store speculation to avoid cases like + if (pthread_mutex_trylock(mutex)) + ++global_variable; + Tree if conversion cannot handle this case well, and it intends to + help vectorization for loops only. */ + if (!noce_mem_is_on_stack(insn_a, orig_x)) + return FALSE; + /* For case like, + if (pthread_mutex_trylock(mutex)) + ++local_variable; + If any stack variable address is taken, potentially this local + variable could be modified by other threads and introduce store + speculation. */ + if (!cfun_no_stack_address_taken) + return FALSE; + } Can you please take a look if you have time? Thank you! > > > if (a_function_call(...)) > > ++local_variable; > > > > If the local_variable doesn't have address taken at all in current > > function, it > mustn't be a pthread mutex lock semantic, and then the following patch > would work to solve (1) and pass the last case as described in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430. > > > > Index: gcc/cse.c > > > ================================================================ > === > > --- gcc/cse.c (revision 268256) > > +++ gcc/cse.c (working copy) > > @@ -540,7 +540,6 @@ > > already as part of an already processed extended basic block. */ > > static sbitmap cse_visited_basic_blocks; > > > > -static bool fixed_base_plus_p (rtx x); static int notreg_cost (rtx, > > machine_mode, enum rtx_code, int); static int preferable (int, int, > > int, int); static void new_basic_block (void); @@ -606,30 +605,7 @@ > > > > static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER; > > > > > > -/* Nonzero if X has the form (PLUS frame-pointer integer). */ > > > > -static bool > > -fixed_base_plus_p (rtx x) > > -{ > > - switch (GET_CODE (x)) > > - { > > - case REG: > > - if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx) > > - return true; > > - if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]) > > - return true; > > - return false; > > - > > - case PLUS: > > - if (!CONST_INT_P (XEXP (x, 1))) > > - return false; > > - return fixed_base_plus_p (XEXP (x, 0)); > > - > > - default: > > - return false; > > - } > > -} > > - > > /* Dump the expressions in the equivalence class indicated by CLASSP. > > This function is used only for debugging. */ DEBUG_FUNCTION void > > Index: gcc/ifcvt.c > > > ================================================================ > === > > --- gcc/ifcvt.c (revision 268256) > > +++ gcc/ifcvt.c (working copy) > > @@ -76,6 +76,9 @@ > > /* Whether conditional execution changes were made. */ static int > > cond_exec_changed_p; > > > > +/* bitmap for stack frame pointer definition insns. */ static bitmap > > +bba_sets_sfp; > > + > > /* Forward references. */ > > static int count_bb_insns (const_basic_block); static bool > > cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int); @@ > > -99,6 +102,7 @@ > > edge, int); static void > > noce_emit_move_insn (rtx, rtx); static rtx_insn *block_has_only_trap > > (basic_block); > > +static void collect_all_fp_insns (void); > > > > > > /* Count the number of non-jump active insns in BB. */ > > > > @@ -2029,6 +2033,110 @@ > > return true; > > } > > > > +/* Return true if MEM x is on stack. a_insn contains x, if it exists. > > +*/ > > + > > +static bool > > +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x) { > > + df_ref use; > > + > > + gcc_assert (x); > > + gcc_assert (MEM_P (x)); > > + > > + /* Early exits if find base register is a stack register. */ rtx a > > + = XEXP (x, 0); if (fixed_base_plus_p(a)) > > + return true; > > + > > + if (!a_insn) > > + return false; > > + > > + /* Check if x is on stack. Assume a mem expression using registers > > + related to stack register is always on stack. */ > > + FOR_EACH_INSN_USE (use, a_insn) > > + { > > + if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use))) > > + return true; > > + } > > + > > + return false; > > +} > > + > > +/* Always return true, if there is a dominating write. > > + > > + When there is a dominating read from memory on stack, > > + 1) if x = a is a memory read, return true. > > + 2) if x = a is a memory write, return true if the memory is on stack. > > + This is the guarantee the memory is *not* readonly. */ > > + > > +static bool > > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn, > > + const_rtx x, bool is_store) { > > + rtx_insn *insn; > > + rtx set; > > + bool find_load = false; > > + > > + gcc_assert (MEM_P (x)); > > + > > + FOR_BB_INSNS (bb, insn) > > + { > > + set = single_set (insn); > > + if (!set) > > + continue; > > + > > + /* Dominating store */ > > + if (rtx_equal_p (x, SET_DEST (set))) > > + return true; > > + > > + /* Dominating load */ > > + if (rtx_equal_p (x, SET_SRC (set))) > > + find_load = true; > > + } > > + > > + if (find_load) > > + { > > + if (is_store && noce_mem_is_on_stack (a_insn, x)) > > + return true; > > + } > > + > > + return false; > > +} > > + > > +/* Return false if the memory a or b must be valid. > > + This function must be called before latent swap of a and b. */ > > + > > +static bool > > +noce_mem_maybe_invalid_p (struct noce_if_info *if_info) { > > + /* for memory load */ > > + if (if_info->a && MEM_P (if_info->a)) > > + { > > + return !noce_valid_for_dominating (if_info->test_bb, > > + if_info->insn_a, > > + if_info->a, false); > > + } > > + > > + /* for memory store */ > > + else if (if_info->b && MEM_P (if_info->b)) > > + { > > + if (!if_info->else_bb) > > + return !noce_valid_for_dominating (if_info->test_bb, > > + if_info->insn_a, > > + if_info->b, true); > > + else > > + return !noce_valid_for_dominating (if_info->test_bb, > > + if_info->insn_b, > > + if_info->b, true); > > + } > > + > > + /* ??? We could handle this if we knew that a load from A or B could > > + not trap or fault. This is also true if we've already loaded > > + from the address along the path from ENTRY. */ > > + return (may_trap_or_fault_p (if_info->a) > > + || may_trap_or_fault_p (if_info->b)); } > > + > > /* Try more complex cases involving conditional_move. */ > > > > static int > > @@ -2065,10 +2173,7 @@ > > is_mem = 1; > > } > > > > - /* ??? We could handle this if we knew that a load from A or B could > > - not trap or fault. This is also true if we've already loaded > > - from the address along the path from ENTRY. */ > > - else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b)) > > + else if (noce_mem_maybe_invalid_p (if_info)) > > return FALSE; > > > > /* if (test) x = a + b; else x = c - d; @@ -2234,7 +2339,7 @@ > > /* If insn to set up A clobbers any registers B depends on, try to > > swap insn that sets up A with the one that sets up B. If even > > that doesn't help, punt. */ > > - if (modified_in_a && !modified_in_b) > > + if (!modified_in_a && modified_in_b) > > { > > if (!noce_emit_bb (emit_b, else_bb, b_simple)) > > goto end_seq_and_fail; > > @@ -2242,7 +2347,7 @@ > > if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > goto end_seq_and_fail; > > } > > - else if (!modified_in_a) > > + else if (!modified_in_b) > > { > > if (!noce_emit_bb (emit_a, then_bb, a_simple)) > > goto end_seq_and_fail; > > @@ -5347,6 +5452,30 @@ > > > > return FALSE; > > } > > + > > +static void > > +collect_all_fp_insns (void) > > +{ > > + rtx_insn *a_insn; > > + bba_sets_sfp = BITMAP_ALLOC (®_obstack); > > + df_ref def; > > + basic_block bb; > > + > > + FOR_ALL_BB_FN (bb, cfun) > > + FOR_BB_INSNS (bb, a_insn) > > + { > > + rtx sset_a = single_set (a_insn); > > + if (!sset_a) > > + continue; > > + > > + if (fixed_base_plus_p (SET_SRC (sset_a))) > > + { > > + FOR_EACH_INSN_DEF (def, a_insn) > > + bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def)); > > + } > > + } > > +} > > + > > > > > > /* Main entry point for all if-conversion. AFTER_COMBINE is true if > > we are after combine pass. */ > > @@ -5381,6 +5510,8 @@ > > > > df_set_flags (DF_LR_RUN_DCE); > > > > + collect_all_fp_insns (); > > + > > /* Go through each of the basic blocks looking for things to convert. > > If we > > have conditional execution, we make multiple passes to allow us to > handle > > IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks. */ @@ > > -5413,6 +5544,8 @@ > > } > > while (cond_exec_changed_p); > > > > + BITMAP_FREE (bba_sets_sfp); > > + > > #ifdef IFCVT_MULTIPLE_DUMPS > > if (dump_file) > > fprintf (dump_file, "\n\n========== no more changes\n"); > > Index: gcc/rtl.h > > > ================================================================ > === > > --- gcc/rtl.h (revision 268256) > > +++ gcc/rtl.h (working copy) > > @@ -3751,6 +3751,8 @@ > > #define hard_frame_pointer_rtx (global_rtl[GR_HARD_FRAME_POINTER]) > > #define arg_pointer_rtx (global_rtl[GR_ARG_POINTER]) > > > > +extern bool fixed_base_plus_p (rtx x); > > + > > #ifndef GENERATOR_FILE > > /* Return the attributes of a MEM rtx. */ static inline const > > struct mem_attrs * > > Index: gcc/rtlanal.c > > > ================================================================ > === > > --- gcc/rtlanal.c (revision 268256) > > +++ gcc/rtlanal.c (working copy) > > @@ -341,6 +341,30 @@ > > return 0; > > } > > > > +/* Nonzero if X has the form (PLUS frame-pointer integer). */ > > + > > +bool > > +fixed_base_plus_p (rtx x) > > +{ > > + switch (GET_CODE (x)) > > + { > > + case REG: > > + if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx) > > + return true; > > + if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]) > > + return true; > > + return false; > > + > > + case PLUS: > > + if (!CONST_INT_P (XEXP (x, 1))) > > + return false; > > + return fixed_base_plus_p (XEXP (x, 0)); > > + > > + default: > > + return false; > > + } > > +} > > + > > /* Compute an approximation for the offset between the register > > FROM and TO for the current function, as it was at the start > > of the routine. */ > > > > Thanks, > > -Jiangning