Hi! On Wed, Dec 07, 2022 at 08:00:08PM +0800, Jiufu Guo wrote: > When assigning a parameter to a variable, or assigning a variable to > return value with struct type, "block move" are used to expand > the assignment. It would be better to use the register mode according > to the target/ABI to move the blocks if the parameter/return is passed > through registers. And then this would raise more opportunities for > other optimization passes(cse/dse/xprop). > > As the example code (like code in PR65421): > > typedef struct SA {double a[3];} A; > A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) > A ret_arg (A a) {return a;} // just empty fun body > void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
What is this like if you use [5] instead? Or use an ABI without homogeneous aggregates? > +static void > +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool > nontemporal) > +{ > + HOST_WIDE_INT size, sub_size; > + int len; > + > + gcc_assert (MEM_P (to_rtx)); > + > + size = MEM_SIZE (to_rtx).to_constant (); > + sub_size = GET_MODE_SIZE (sub_mode).to_constant (); > + len = size / sub_size; Unrelated, but a pet peeve: it is much more modern (and imo much better taste) to not put all declarations at the start; just declare at first use: gcc_assert (MEM_P (to_rtx)); HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant (); HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant (); int len = size / sub_size; > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > @@ -0,0 +1,15 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +typedef struct SA > +{ > + double a[2]; > + long l; > +} A; > + > +/* std 3 param regs to return slot */ > +A ret_arg (A a) {return a;} > +/* { dg-final { scan-assembler-times {\mstd 4,0\(3\)\s} 1 } } */ > +/* { dg-final { scan-assembler-times {\mstd 5,8\(3\)\s} 1 } } * > +/* { dg-final { scan-assembler-times {\mstd 6,16\(3\)\s} 1 } } */ This is only correct on certain ABIs, probably only ELFv2 even. We certainly can improve the homogeneous aggregates stuff, but please make sure you don't degrade all other stuff? Older, as well as when things are not an homogeneous aggregate, for example too big. Can you please add tests for such cases? Segher