Thanks for using maskload/store_optab and sorry for the very slow review. Been a bit swamped lately...
The patch seems to be using maskload and maskstore as though they were variants of movMMcc, with the comparison being part of the load/store. Instead, the current interface is that: maskload <op0, op1, op2, op3> would do: op0 = op2 ? op1 : op3 where op2 is an already-computed condition. Similarly: maskstore <op0, op1, op2> would do: if (op2) op0 = op1 where again op2 is an already-computed condition. (Note that maskstore only has three operands, not four.) This would be the natural interface if we were trying to generate these instructions from gimple. But I suppose the interface makes things awkward for this patch, where you're trying to generate the pattern from an RTL pass. So this raises two questions: (1) How should we handle the requirement to have a comparison operand, instead of the normal precomputed boolean operand? (2) maskload and maskstore take two modes: the mode of the data being loaded/stored, and the mode of the condition. What should the mode of the condition be if the operand is a comparison? TBH I'm not sure what to do here. One option would be to emit a separate cstore (emit_store_flag) and then pass the result of that cstore to operand 2 of the maskload/store. The mode of the operand could be the integer equivalent of the value being loaded/stored (e.g. SI when loading or storing SF). I think this would work best with any future gimple support. But it likely means that we rely on combine to eliminate redundant comparison-of-cstore sequences. Another option would be to allow operand 2 to be a comparison operand, as for movMMcc. Regarding (2), we could choose to use VOIDmode for the second mode, since (a) that is the actual mode of a canonicalised comparison and (b) it should safely isolate the comparison usage from the non-comparison usage. If no-one has any better suggestions, I suppose we should do this. It isn't mutually exclusive with the first option: we could still handle precomputed boolean conditions as well, in a future patch. "Kong, Lingling" <lingling.k...@intel.com> writes: > @@ -2132,6 +2134,54 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool > simple) > return true; > } > > +/* Return TRUE if we could convert "if (test) *x = a; else skip" to > + scalar mask store and could do conditional faulting movcc, i.e. > + x86 cfcmov, especially when store x may cause memmory faults and > + in else_bb x == b. */ > + > +static bool > +can_use_scalar_mask_store (rtx x, rtx a, rtx b, bool a_simple) > +{ > + gcc_assert (MEM_P (x)); > + > + machine_mode x_mode = GET_MODE (x); > + if (convert_optab_handler (maskstore_optab, x_mode, > + x_mode) == CODE_FOR_nothing) If we go for the option described above, the second mode here should be the mode of if_info.cond. > + return false; > + > + if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x)) > + return false; > + if (!a_simple || !register_operand (a, x_mode)) > + return false; Could you explain the purpose of the last if statement? I would have expected noce_try_cmove_arith to handle other forms of "a" correctly (as long as they don't fault -- more on that below). > + > + return true; > +} > + > +/* Return TRUE if backend supports scalar maskload_optab/maskstore_optab, > + which suppressed memory faults when load or store a memory operand > + and the condition code evaluates to false. */ > + > +static bool > +can_use_scalar_mask_load_store (struct noce_if_info *if_info) > +{ > + rtx a = if_info->a; > + rtx b = if_info->b; > + rtx x = if_info->x; > + > + if (!MEM_P (a) && !MEM_P (b)) > + return false; > + > + if (MEM_P (x)) > + return can_use_scalar_mask_store (x, a, b, if_info->then_simple); > + else > + /* Return TRUE if backend supports scalar maskload_optab, we could > convert > + "if (test) x = *a; else x = b;" or "if (test) x = a; else x = *b;" > + to conditional faulting movcc, i.e. x86 cfcmov, especially when load a > + or b may cause memmory faults. */ > + return convert_optab_handler (maskstore_optab, GET_MODE (a), > + GET_MODE (a)) != CODE_FOR_nothing; It looks like this should be maskload_optab. "a" might be a VOIDmode constant, so it's probably better to use GET_MODE (x) for the first mode. The comment above about the second mode applies here too. > +} > + > /* Try more complex cases involving conditional_move. */ > > static bool > @@ -2171,7 +2221,17 @@ noce_try_cmove_arith (struct noce_if_info *if_info) > /* ??? We could handle this if we knew that a load from A or B could > not trap or fault. This is also true if we've already loaded > from the address along the path from ENTRY. */ This comment is now a little out of place. > - else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b)) > + /* Just wait cse_not_expected, then convert to conditional mov on their > + addresses followed by a load. */ > + else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b)) > + return false; may_trap_or_fault_p (a) and may_trap_or_fault_p (b) are tested multiple times. Since they're relatively expensive tests, it would be good to cache them. > + /* Scalar maskload_optab/maskstore_optab implements conditionally faulting > + which means that if the condition code evaluates to false, all memory > + faults are suppressed when load or store a memory operand. Now we could > + load or store a memory operand may trap or fault for conditional > + move. */ > + else if ((may_trap_or_fault_p (a) ^ may_trap_or_fault_p (b)) > + && !can_use_scalar_mask_load_store (if_info)) > return false; I think it'd be more robust to keep the may_trap_or_fault_p and MEM_P tests together. AIUI there are two cases: (1) MEM_P (a) && a_may_trap_or_fault && !b_may_trap_or_fault (2) MEM_P (b) && b_may_tray_or_fault && !a_may_trap_or_fault The code currently requires !MEM_P (b) for the first case and !MEM_P (a) for the second case. But I'm not sure that's necessary for correctness. Is it a costing decision instead? I think it'd be better to test: else if ((<case 1> || <case 2>) && can_use_scalar_mask_load_store (if_info)) ; else if (a_may_trap_or_fault || b_may_trap_or_fault) return false; > > /* if (test) x = a + b; else x = c - d; > @@ -2247,9 +2307,14 @@ noce_try_cmove_arith (struct noce_if_info *if_info) > /* If either operand is complex, load it into a register first. > The best way to do this is to copy the original insn. In this > way we preserve any clobbers etc that the insn may have had. > - This is of course not possible in the IS_MEM case. */ > + This is of course not possible in the IS_MEM case. > + For load or store a operands may trap or fault, should not > + hoist the load or store, otherwise it unable to suppress memory > + fault, it just a normal arithmetic insn insteads of conditional > + faulting movcc. */ > > - if (! general_operand (a, GET_MODE (a)) || tmp_a) > + if (! may_trap_or_fault_p (a) > + && (! general_operand (a, GET_MODE (a)) || tmp_a)) > { > > if (is_mem) > @@ -2278,7 +2343,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info) > } > } > > - if (! general_operand (b, GET_MODE (b)) || tmp_b) > + if (! may_trap_or_fault_p (b) > + && (! general_operand (b, GET_MODE (b)) || tmp_b)) > { > if (is_mem) > { > @@ -4210,12 +4276,31 @@ noce_process_if_block (struct noce_if_info *if_info) > } > > if (!set_b && MEM_P (orig_x)) > - /* We want to avoid store speculation to avoid cases like > - if (pthread_mutex_trylock(mutex)) > - ++global_variable; > - Rather than go to much effort here, we rely on the SSA optimizers, > - which do a good enough job these days. */ > - return false; > + { > + /* When target support conditional faulting movcc, i.e. x86 cfcmov, > + we could do conditonal mem store for "if (...) *x = a; else skip" > + to maskstore_optab, which x may trap or fault. */ > + if ((convert_optab_handler (maskstore_optab, GET_MODE (orig_x), > + GET_MODE (orig_x)) != CODE_FOR_nothing) > + && HAVE_conditional_move > + && may_trap_or_fault_p (orig_x) > + && register_operand (a, GET_MODE (orig_x))) > + { > + x = orig_x; > + if_info->x = x; > + if (noce_try_cmove_arith (if_info)) > + goto success; > + else > + return false; > + } > + /* We want to avoid store speculation to avoid cases like > + if (pthread_mutex_trylock(mutex)) > + ++global_variable; > + Rather than go to much effort here, we rely on the SSA optimizers, > + which do a good enough job these days. */ > + else > + return false; > + } > > if (noce_try_move (if_info)) > goto success; > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > index 03ef0c5d81d..524c766d336 100644 > --- a/gcc/optabs.cc > +++ b/gcc/optabs.cc > @@ -5085,6 +5085,16 @@ emit_conditional_move (rtx target, struct > rtx_comparison comp, > > icode = direct_optab_handler (movcc_optab, mode); > > + if (may_trap_or_fault_p (target) && MEM_P (target) > + && convert_optab_handler (maskstore_optab, mode, > + mode) != CODE_FOR_nothing) > + icode = convert_optab_handler (maskstore_optab, mode, mode); > + else if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3)) > + && (MEM_P (op2) || MEM_P (op3)) > + && convert_optab_handler (maskload_optab, > + mode, mode) != CODE_FOR_nothing) > + icode = convert_optab_handler (maskload_optab, mode, mode); > + > if (icode == CODE_FOR_nothing) > return NULL_RTX; > I think the maskload/store cases should instead be treated as separate operations, rather than inferred from the operands. The current interface of emit_conditional_move is that OP2 and OP3 can be evaluated unconditionally. Thanks, Richard > @@ -5217,6 +5227,16 @@ emit_conditional_move_1 (rtx target, rtx comparison, > > icode = direct_optab_handler (movcc_optab, mode); > > + if (may_trap_or_fault_p (target) && MEM_P (target) > + && convert_optab_handler (maskstore_optab, mode, > + mode) != CODE_FOR_nothing) > + icode = convert_optab_handler (maskstore_optab, mode, mode); > + else if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3)) > + && (MEM_P (op2) || MEM_P (op3)) > + && convert_optab_handler (maskload_optab, > + mode, mode) != CODE_FOR_nothing) > + icode = convert_optab_handler (maskload_optab, mode, mode); > + > if (icode == CODE_FOR_nothing) > return NULL_RTX;