On Wed, Jul 9, 2014 at 6:02 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote: > Hello, > > the implementation of homogenous float aggregates for the ELFv2 ABI has > unfortunately shown to have a bug in a corner case. > > The problem is that because such aggregates are packed in the argument > save area, but each (4-byte) float occupies one of just 13 registers > on its own, we may run out of registers while we're still within the > first 64 bytes of the argument save area. > > Usually, any argument that doesn't fit into register should go in > memory. But that rule doesn't apply within the first 64 bytes, where > such arguments need to go into GPRs first. This is important since > the ABI guarantees that the first 64 bytes of the save area are free, > e.g. to store GPRs into. If an argument is actually passed within > those first 64 bytes, some code (e.g. libffi assembler stubs) may > clobber its contents. > > Now, the existing rs6000_function_arg code will handle this case > correctly if the extra floats come in a *new* argument. For example, > in the following test case > > struct float2 { float x[2]; }; > struct float6 { float x[6]; }; > struct float8 { float x[8]; }; > > float func (struct float8 a, struct float6 b, struct float2 c); > > both parts of "c" are correctly expected in r10. > > However, the code handles incorrectly the case where a *single* > aggregate argument is split between FPRs and "extra" floats. > For example, "b.x[5]" is expected in memory, although it ought > to reside in r9. > > The appended patch fixes the implementation to comply with the ABI. > > This is an ABI change for the affected corner cases of the ELFv2 > ABI. However, those cases should be extremely rare; the full > compat.exe and struct-layout-1.exp ABI compatibility test suite > passed, with the exception of two tests specifically intended > to test multiple homogeneous float aggregates. > > Tested on powerpc64le-linux. > > OK for mainline? > [ The patch should then also go into the 4.8 and 4.9 branches for > consistency. ]
Can you add -Wpsabi warnings for all the issues you found? That way a world re-build will at least show if anything important is affected. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > * config/rs6000/rs6000.c (rs6000_function_arg): If a float argument > does not fit fully into floating-point registers, and there is still > space in the register parameter area, use GPRs to pass those parts > of the argument. > (rs6000_arg_partial_bytes): Likewise. > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 212100) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -10227,6 +10227,7 @@ > rtx r, off; > int i, k = 0; > unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7) >> 3; > + int fpr_words; > > /* Do we also need to pass this argument in the parameter > save area? */ > @@ -10255,6 +10256,37 @@ > rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); > } > > + /* If there were not enough FPRs to hold the argument, the rest > + usually goes into memory. However, if the current position > + is still within the register parameter area, a portion may > + actually have to go into GPRs. > + > + Note that it may happen that the portion of the argument > + passed in the first "half" of the first GPR was already > + passed in the last FPR as well. > + > + For unnamed arguments, we already set up GPRs to cover the > + whole argument in rs6000_psave_function_arg, so there is > + nothing further to do at this point. */ > + fpr_words = (i * GET_MODE_SIZE (elt_mode)) / (TARGET_32BIT ? 4 : 8); > + if (i < n_elts && align_words + fpr_words < GP_ARG_NUM_REG > + && cum->nargs_prototype > 0) > + { > + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode; > + int n_words = rs6000_arg_size (mode, type); > + > + align_words += fpr_words; > + n_words -= fpr_words; > + > + do > + { > + r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words); > + off = GEN_INT (fpr_words++ * GET_MODE_SIZE (rmode)); > + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); > + } > + while (++align_words < GP_ARG_NUM_REG && --n_words != 0); > + } > + > return rs6000_finish_function_arg (mode, rvec, k); > } > else if (align_words < GP_ARG_NUM_REG) > @@ -10330,8 +10362,23 @@ > /* Otherwise, we pass in FPRs only. Check for partial copies. */ > passed_in_gprs = false; > if (cum->fregno + n_elts * n_fpreg > FP_ARG_MAX_REG + 1) > - ret = ((FP_ARG_MAX_REG + 1 - cum->fregno) > - * MIN (8, GET_MODE_SIZE (elt_mode))); > + { > + /* Compute number of bytes / words passed in FPRs. If there > + is still space available in the register parameter area > + *after* that amount, a part of the argument will be passed > + in GPRs. In that case, the total amount passed in any > + registers is equal to the amount that would have been passed > + in GPRs if everything were passed there, so we fall back to > + the GPR code below to compute the appropriate value. */ > + int fpr = ((FP_ARG_MAX_REG + 1 - cum->fregno) > + * MIN (8, GET_MODE_SIZE (elt_mode))); > + int fpr_words = fpr / (TARGET_32BIT ? 4 : 8); > + > + if (align_words + fpr_words < GP_ARG_NUM_REG) > + passed_in_gprs = true; > + else > + ret = fpr; > + } > } > > if (passed_in_gprs > -- > Dr. Ulrich Weigand > GNU/Linux compilers and toolchain > ulrich.weig...@de.ibm.com >