Hi Segher, Thanks a lot for your helpful comments!
Segher Boessenkool <seg...@kernel.crashing.org> writes: > On Thu, Dec 08, 2022 at 09:17:38PM +0800, Jiufu Guo wrote: >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> > On Wed, Dec 07, 2022 at 08:00:08PM +0800, Jiufu Guo wrote: >> >> typedef struct SA {double a[3];} A; >> >> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) >> >> A ret_arg (A a) {return a;} // just empty fun body >> >> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) >> > >> > What is this like if you use [5] instead? Or use an ABI without >> > homogeneous aggregates? >> Thanks for this question! I also tested the cases on different array >> types or different sizes, or mixed field types. >> >> If it is out of the number of registers for passing the param >> or return, it is treated as a mem block. >> For parameter, it is partially passed via registers, and partially >> passing via stack. >> For return, it is returned via a pointer (with one invisible pointer >> parameter). And the <retval> of the function is not with parallel code. >> >> This patch does not cover these cases. > > Understood, sure; but my point is, can it degrade code quality in such > cases? I don't see anything in the patch that precludes that. No, the behavior of such cases is not affected in this patch. The preclude code is in "assign_parm_setup_block". This patch only shows the different parts, the context is not shown. In assign_parm_setup_block, this patch marks "DECL_STACK_REGS_P" only for "REG_P (entry_parm) || GET_CODE (entry_parm) == PARALLEL" which indicates the registers are enough to pass the param. > >> >> --- /dev/null >> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c >> >> @@ -0,0 +1,15 @@ >> >> +/* PR target/65421 */ >> >> +/* { dg-options "-O2" } */ >> >> +/* { dg-require-effective-target has_arch_ppc64 } */ >> >> + >> >> +typedef struct SA >> >> +{ >> >> + double a[2]; >> >> + long l; >> >> +} A; >> >> + >> >> +/* std 3 param regs to return slot */ >> >> +A ret_arg (A a) {return a;} >> >> +/* { dg-final { scan-assembler-times {\mstd 4,0\(3\)\s} 1 } } */ >> >> +/* { dg-final { scan-assembler-times {\mstd 5,8\(3\)\s} 1 } } * >> >> +/* { dg-final { scan-assembler-times {\mstd 6,16\(3\)\s} 1 } } */ >> > >> > This is only correct on certain ABIs, probably only ELFv2 even. >> Thanks for point out this! >> This is only correct if the ABI allows this struct to be passed >> through integer registers, and return through the mem block. > > And it needs to be in those specific registers / at those specific > offsets as well. Yes. > > Btw, please leave out the \s? Thanks! > >> In the previous version, I added a requirement on ELFv2. As tested on >> BE environments, this case also pass. So, I deleted the requirement. > > BE for ELFv2 also exists, fwiw. Yeap! We have -mabi=elfv2. > >> (While on BE environments, there is another issue: some unnecessary >> memory stores are not deleted.) > > Huh. Does that happen with the current compiler as well? Do you have > an example? We can use the test case (pr65421-1.c) as the example -:) typedef struct SA {double a[2]; long l; } A; A ret_arg (A a) {return a;} For this case, without the patch, below is generated: std 4,56(1) std 5,64(1) li 10,56 std 6,72(1) std 6,16(3) lxvd2x 0,1,10 stxvd2x 0,0,3 With the patch, below is generated: std 4,56(1) std 5,64(1) std 6,72(1) std 4,0(3) std 5,8(3) std 6,16(3) The first 3 std insns are reductant. This is an unrelated issue. With -mabi=elfv2, code can be optimized, and those 3 insns are deleted. I think it would be fine to just test this case on powerpc_elfv2. I would merge pr65421-1.c into pr65421.c (with dg-require elfv2). > >> But with more reading of the code 'rs6000_function_arg', as you said, >> I'm not sure if this behavior meets other ABIs (at least, it seems, >> this is not correct on darwin64). >> So, as you said, we may add a requirement on ELFv2; Or leave this >> case there, and add "! target" when hitting failure? > > If you do !target the testcase won't test much at all anymore ;-) Right. we could use this method to exclude the sub-targets which are not using r4,5,6 for the param for this case. > >> > We certainly can improve the homogeneous aggregates stuff, but please >> > make sure you don't degrade all other stuff? Older, as well as when >> > things are not an homogeneous aggregate, for example too big. Can you >> > please add tests for such cases? >> Sure, thanks! I encounter one issue in this kind of case (large struct) >> on a previous version path. > > Perhaps it would be better to have a hook so that every target (and > subtarget) can fine tune exactly when this is done. Then again, perhaps > I worry too much. Understand, I also thought about using a hook for targets to tune! The good news is that: a few target hooks are used by generic code for arg&ret, and the target-related info (e.g. if a struct param is passed via registers) are available (indirectly). And I also feel that I may need to add a hook when the required target info is not accessible(or not suitable) for generic code. Thanks again for your review! BR, Jeff (Jiufu) > > > Segher