Hi Segher, Thanks a lot for your review!
Segher Boessenkool <seg...@kernel.crashing.org> writes: > Hi! > > On Wed, Dec 07, 2022 at 08:00:08PM +0800, Jiufu Guo wrote: >> When assigning a parameter to a variable, or assigning a variable to >> return value with struct type, "block move" are used to expand >> the assignment. It would be better to use the register mode according >> to the target/ABI to move the blocks if the parameter/return is passed >> through registers. And then this would raise more opportunities for >> other optimization passes(cse/dse/xprop). >> >> As the example code (like code in PR65421): >> >> typedef struct SA {double a[3];} A; >> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) >> A ret_arg (A a) {return a;} // just empty fun body >> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) > > What is this like if you use [5] instead? Or use an ABI without > homogeneous aggregates? Thanks for this question! I also tested the cases on different array types or different sizes, or mixed field types. If it is out of the number of registers for passing the param or return, it is treated as a mem block. For parameter, it is partially passed via registers, and partially passing via stack. For return, it is returned via a pointer (with one invisible pointer parameter). And the <retval> of the function is not with parallel code. This patch does not cover these cases. > >> +static void >> +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool >> nontemporal) >> +{ >> + HOST_WIDE_INT size, sub_size; >> + int len; >> + >> + gcc_assert (MEM_P (to_rtx)); >> + >> + size = MEM_SIZE (to_rtx).to_constant (); >> + sub_size = GET_MODE_SIZE (sub_mode).to_constant (); >> + len = size / sub_size; > > Unrelated, but a pet peeve: it is much more modern (and imo much better > taste) to not put all declarations at the start; just declare at first > use: > > gcc_assert (MEM_P (to_rtx)); > > HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant (); > HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant (); > int len = size / sub_size; > Thanks for catch this! >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c >> @@ -0,0 +1,15 @@ >> +/* PR target/65421 */ >> +/* { dg-options "-O2" } */ >> +/* { dg-require-effective-target has_arch_ppc64 } */ >> + >> +typedef struct SA >> +{ >> + double a[2]; >> + long l; >> +} A; >> + >> +/* std 3 param regs to return slot */ >> +A ret_arg (A a) {return a;} >> +/* { dg-final { scan-assembler-times {\mstd 4,0\(3\)\s} 1 } } */ >> +/* { dg-final { scan-assembler-times {\mstd 5,8\(3\)\s} 1 } } * >> +/* { dg-final { scan-assembler-times {\mstd 6,16\(3\)\s} 1 } } */ > > This is only correct on certain ABIs, probably only ELFv2 even. Thanks for point out this! This is only correct if the ABI allows this struct to be passed through integer registers, and return through the mem block. In the previous version, I added a requirement on ELFv2. As tested on BE environments, this case also pass. So, I deleted the requirement. (While on BE environments, there is another issue: some unnecessary memory stores are not deleted.) But with more reading of the code 'rs6000_function_arg', as you said, I'm not sure if this behavior meets other ABIs (at least, it seems, this is not correct on darwin64). So, as you said, we may add a requirement on ELFv2; Or leave this case there, and add "! target" when hitting failure? > > > We certainly can improve the homogeneous aggregates stuff, but please > make sure you don't degrade all other stuff? Older, as well as when > things are not an homogeneous aggregate, for example too big. Can you > please add tests for such cases? Sure, thanks! I encounter one issue in this kind of case (large struct) on a previous version path. Thanks again for your comments and suggestions! BR, Jeff (Jiufu) > > > Segher