Hi! Thanks for your helpful comments/sugguestions!
Richard Biener <rguent...@suse.de> writes: > On Mon, 14 Nov 2022, Jiufu Guo wrote: > >> >> Hi! >> Thanks so much for your review! >> >> Richard Biener <rguent...@suse.de> writes: >> >> > On Fri, 11 Nov 2022, Jiufu Guo wrote: >> > >> >> Hi, >> >> >> >> When assigning a struct parameter to another variable, or loading a >> >> memory block to a struct var (especially for return value), >> >> Now, "block move" would be used during expand the assignment. And >> >> the "block move" may use a type/mode different from the mode which >> >> is accessing the var. e.g. on ppc64le, V2DI would be used to move >> >> the block of 16bytes. >> >> >> >> And then, this "block move" would prevent optimization passes from >> >> leaping/crossing over the assignment. PR65421 reflects this issue. >> >> >> >> As the example code in PR65421. >> >> >> >> typedef struct { double a[4]; } A; >> >> A foo (const A *a) { return *a; } >> >> >> >> On ppc64le, the below instructions are used for the "block move": >> >> 7: r122:V2DI=[r121:DI] >> >> 8: r124:V2DI=[r121:DI+r123:DI] >> >> 9: [r112:DI]=r122:V2DI >> >> 10: [r112:DI+0x10]=r124:V2DI >> >> >> >> For this issue, a few comments/suggestions are mentioned via RFC: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >> >> I drafted a patch which is updating the behavior of block_move for >> >> struct type. This patch is simple to work with, a few ideas in the >> >> comments are not put into this patch. I would submit this >> >> patch first. >> >> >> >> The idea is trying to use sub-modes(scalar) for the "block move". >> >> And the sub-modes would align with the access patterns of the >> >> struct members and usages on parameter/return value. >> >> The major benefits of this change would be raising more >> >> opportunities for other optimization passes(cse/dse/xprop). >> >> >> >> The suitable mode would be target specified and relates to ABI, >> >> this patch introduces a target hook. And in this patch, the hook >> >> is implemented on rs6000. >> >> >> >> In this patch, the hook would be just using heuristic modes for all >> >> struct block moving. And the hook would not check if the "block move" >> >> is about parameters or return value or other uses. >> >> >> >> For the rs6000 implementation of this hook, it is able to use >> >> DF/DI/TD/.. modes for the struct block movement. The sub-modes >> >> would be the same as the mode when the struct type is on parameter or >> >> return value. >> >> >> >> Bootstrapped and regtested on ppc64/ppc64le. >> >> Is this ok for trunk? >> >> >> >> >> >> BR, >> >> Jeff(Jiufu) >> >> >> >> >> >> gcc/ChangeLog: >> >> >> >> * config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define. >> >> (submode_for_struct_block_move): New function. Called from >> >> rs600_block_move_for_struct. >> >> (rs600_block_move_for_struct): New function. >> >> * doc/tm.texi: Regenerate. >> >> * doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New. >> >> * expr.cc (store_expr): Call block_move_for_struct. >> >> * target.def (block_move_for_struct): New hook. >> >> * targhooks.cc (default_block_move_for_struct): New function. >> >> * targhooks.h (default_block_move_for_struct): New Prototype. >> >> >> >> --- >> >> gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++ >> >> gcc/doc/tm.texi | 6 +++++ >> >> gcc/doc/tm.texi.in | 2 ++ >> >> gcc/expr.cc | 14 +++++++++--- >> >> gcc/target.def | 10 +++++++++ >> >> gcc/targhooks.cc | 7 ++++++ >> >> gcc/targhooks.h | 1 + >> >> 7 files changed, 81 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> >> index a85d7630b41..e14cecba0ef 100644 >> >> --- a/gcc/config/rs6000/rs6000.cc >> >> +++ b/gcc/config/rs6000/rs6000.cc >> >> @@ -1758,6 +1758,9 @@ static const struct attribute_spec >> >> rs6000_attribute_table[] = >> >> #undef TARGET_NEED_IPA_FN_TARGET_INFO >> >> #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info >> >> >> >> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT >> >> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct >> >> + >> >> #undef TARGET_UPDATE_IPA_FN_TARGET_INFO >> >> #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info >> >> >> >> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype, >> >> return gen_rtx_REG (mode, regno); >> >> } >> >> >> >> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode >> >> which >> >> + would be used to move the struct. */ >> >> +static machine_mode >> >> +submode_for_struct_block_move (tree type) >> >> +{ >> >> + gcc_assert (TREE_CODE (type) == RECORD_TYPE); >> >> + >> >> + /* The sub mode may not be the field's type of the struct. >> >> + It would be fine to use the mode as if the type is used as a >> >> function >> >> + parameter or return value. For example: DF for "{double a[4];}", >> >> and >> >> + DI for "{doubel a[3]; long l;}". >> >> + Here, using the mode as if it is function return type. */ >> >> + rtx val = rs6000_function_value (type, NULL, 0); >> >> + return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, >> >> 0), 0)) >> >> + : word_mode; >> >> +} >> >> + >> >> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook. */ >> >> +static void >> >> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT >> >> method) >> >> +{ >> >> + machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp)); >> >> + int mode_size = GET_MODE_SIZE (mode); >> >> + int size = UINTVAL (expr_size (exp)); >> >> + if (size < mode_size || (size % mode_size) != 0 || size > 64) >> >> + { >> >> + default_block_move_for_struct (x, y, exp, method); >> >> + return; >> >> + } >> >> + >> >> + int len = size / mode_size; >> >> + for (int i = 0; i < len; i++) >> >> + { >> >> + rtx temp = gen_reg_rtx (mode); >> >> + rtx src = adjust_address (y, mode, mode_size * i); >> >> + rtx dest = adjust_address (x, mode, mode_size * i); >> >> + emit_move_insn (temp, src); >> >> + emit_move_insn (dest, temp); >> >> + } >> >> +} >> >> + >> >> /* Define how to find the value returned by a library function >> >> assuming the value has mode MODE. */ >> >> rtx >> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> >> index 8572313b308..c8a1c1f30cf 100644 >> >> --- a/gcc/doc/tm.texi >> >> +++ b/gcc/doc/tm.texi >> >> @@ -1380,6 +1380,12 @@ retain the field's mode. >> >> Normally, this is not needed. >> >> @end deftypefn >> >> >> >> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, >> >> rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method}) >> >> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure >> >> +type. @var{method} is the method if this function invokes block_move. >> >> +The default definition invokes block_move. >> >> +@end deftypefn >> >> + >> >> @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified}) >> >> Define this macro as an expression for the alignment of a type (given >> >> by @var{type} as a tree node) if the alignment computed in the usual >> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> >> index 986e8f0da09..f0f525a2008 100644 >> >> --- a/gcc/doc/tm.texi.in >> >> +++ b/gcc/doc/tm.texi.in >> >> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure. >> >> >> >> @hook TARGET_MEMBER_TYPE_FORCES_BLK >> >> >> >> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT >> >> + >> >> @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified}) >> >> Define this macro as an expression for the alignment of a type (given >> >> by @var{type} as a tree node) if the alignment computed in the usual >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> >> index c6917fbf7bd..734dc07a76b 100644 >> >> --- a/gcc/expr.cc >> >> +++ b/gcc/expr.cc >> >> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p, >> >> emit_group_store (target, temp, TREE_TYPE (exp), >> >> int_size_in_bytes (TREE_TYPE (exp))); >> >> else if (GET_MODE (temp) == BLKmode) >> >> - emit_block_move (target, temp, expr_size (exp), >> >> - (call_param_p >> >> - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); >> >> + { >> >> + if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE) >> >> + targetm.block_move_for_struct (target, temp, exp, >> >> + call_param_p ? BLOCK_OP_CALL_PARM >> >> + : BLOCK_OP_NORMAL); >> > >> > I think it's only sensible to do something about the call_param_p case, >> > all other cases are GIGO. And I don't see a good reason to add a new >> > target hook since function parameter passing info is readily >> > available. >> This patch enhances the normal assignments on struct variables ( >> expand_assignment-->store_expr, where the call_param_p is 0). Because >> the "struct block move/copy" is generated for this case too. >> >> For example, the struct block_move is generated. >> # .MEM_3 = VDEF <.MEM_1(D)> >> D.3956 = *a_2(D); >> For this stmt, "D.3956" is a struct variable, and "a_2(D)" points to a >> block of the struct type. >> >> If we only care about parameter/arg/retval(s), we would not need a >> hook, as you said, since parm/retval contains the information about >> the target/ABI. And for more as you suggested in the previous emails, >> we could do SRA-like analyze to check if an assignment relates to >> parm/retval. > > But the issue with applying a stmt-local heuristic is that it's going > to be as many times wrong as it is right so there inevitably will be > regressions. Understant. :-) In the patch, the heurisitic is aplied to all assignment on struct (no matter parm/return related). The generated binary would be changed definitly: mem move(via vector loads/stores) for struct type becomes through sub-modes scalar loads/stores. The binary change may cause some difference(e.g. performance change). Even I had spec2017 tests on some machines(P9 and P10), it is sitll possible has regressions. > > Also copies not directly involving parameters or returns can be optimized > on the GIMPLE level just fine (see what SRA does), the exception is > cpymem optab implementations but those you short-circuit as well here. Yes, agree. So, enhancing the stmt/assignment about param/retval would fix the issue (with low regression risks). > >> The current patch is simple relatively, it uses 'scalar-modes' to move >> blocks for all struct-type assignments. I feel this would be enough, >> and I'm wondering if we need to check parm/retval. > > It's only for the param/retval that RTL expansion has more information > than GIMPLE and in that specific case the code generation is very > difficult to untie and thus a change like this looks appropriate. Understant, I will update patch to handle more about param/retval. Thanks again! BR, Jeff (Jiufu) > > Thanks, > Richard. > >> >> > >> > The question is whether this is best handled up in the call chain where >> > the parameter setup is done or below here (or in emit_block_move). >> As above discussed, this patch touches 'block move' for struct type, >> including copy from 'parameter stack block' to a variable block. >> Oh, assigning from 'parameter regs' to 'stack block' is not changed >> here, it is done through "assign_parms-...->emit_group_xx". >> >> If any misunderstanding, or any comments/concerns, please point them >> out. Thanks a lot! >> >> BR, >> Jeff (Jiufu) >> >> > >> > Richard. >> > >> >> + else >> >> + emit_block_move (target, temp, expr_size (exp), >> >> + (call_param_p ? BLOCK_OP_CALL_PARM >> >> + : BLOCK_OP_NORMAL)); >> >> + } >> >> + >> >> /* If we emit a nontemporal store, there is nothing else to do. */ >> >> else if (nontemporal && emit_storent_insn (target, temp)) >> >> ; >> >> diff --git a/gcc/target.def b/gcc/target.def >> >> index 25f94c19fa7..e141f72a8a3 100644 >> >> --- a/gcc/target.def >> >> +++ b/gcc/target.def >> >> @@ -5584,6 +5584,16 @@ Normally, this is not needed.", >> >> bool, (const_tree field, machine_mode mode), >> >> default_member_type_forces_blk) >> >> >> >> + >> >> +/* Move block for structure type. */ >> >> +DEFHOOK >> >> +(block_move_for_struct, >> >> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in >> >> structure\n\ >> >> +type. @var{method} is the method if this function invokes block_move.\n\ >> >> +The default definition invokes block_move.", >> >> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method), >> >> + default_block_move_for_struct) >> >> + >> >> /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions >> >> that gate the divod transform. */ >> >> DEFHOOK >> >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc >> >> index 12a58456b39..2b96c348419 100644 >> >> --- a/gcc/targhooks.cc >> >> +++ b/gcc/targhooks.cc >> >> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, >> >> machine_mode) >> >> return false; >> >> } >> >> >> >> +/* Default version of block_move_for_struct. */ >> >> +void >> >> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT >> >> method) >> >> +{ >> >> + emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method); >> >> +} >> >> + >> >> /* Default version of canonicalize_comparison. */ >> >> >> >> void >> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h >> >> index a6a423c1abb..c284a35ee28 100644 >> >> --- a/gcc/targhooks.h >> >> +++ b/gcc/targhooks.h >> >> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const >> >> char*); >> >> >> >> extern scalar_int_mode default_cstore_mode (enum insn_code); >> >> extern bool default_member_type_forces_blk (const_tree, machine_mode); >> >> +extern void default_block_move_for_struct (rtx, rtx, tree, >> >> HOST_WIDE_INT); >> >> extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *); >> >> extern tree build_va_arg_indirect_ref (tree); >> >> extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, >> >> gimple_seq *); >> >> >>