On Wed, Jul 9, 2014 at 6:02 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote:
> Hello,
>
> the implementation of homogenous float aggregates for the ELFv2 ABI has
> unfortunately shown to have a bug in a corner case.
>
> The problem is that because such aggregates are packed in the argument
> save area, but each (4-byte) float occupies one of just 13 registers
> on its own, we may run out of registers while we're still within the
> first 64 bytes of the argument save area.
>
> Usually, any argument that doesn't fit into register should go in
> memory.  But that rule doesn't apply within the first 64 bytes, where
> such arguments need to go into GPRs first.  This is important since
> the ABI guarantees that the first 64 bytes of the save area are free,
> e.g. to store GPRs into.  If an argument is actually passed within
> those first 64 bytes, some code (e.g. libffi assembler stubs) may
> clobber its contents.
>
> Now, the existing rs6000_function_arg code will handle this case
> correctly if the extra floats come in a *new* argument.  For example,
> in the following test case
>
> struct float2 { float x[2]; };
> struct float6 { float x[6]; };
> struct float8 { float x[8]; };
>
> float func (struct float8 a, struct float6 b, struct float2 c);
>
> both parts of "c" are correctly expected in r10.
>
> However, the code handles incorrectly the case where a *single*
> aggregate argument is split between FPRs and "extra" floats.
> For example, "b.x[5]" is expected in memory, although it ought
> to reside in r9.
>
> The appended patch fixes the implementation to comply with the ABI.
>
> This is an ABI change for the affected corner cases of the ELFv2
> ABI.  However, those cases should be extremely rare; the full
> compat.exe and struct-layout-1.exp ABI compatibility test suite
> passed, with the exception of two tests specifically intended
> to test multiple homogeneous float aggregates.
>
> Tested on powerpc64le-linux.
>
> OK for mainline?
> [ The patch should then also go into the 4.8 and 4.9 branches for
> consistency. ]

Can you add -Wpsabi warnings for all the issues you found?  That way
a world re-build will at least show if anything important is affected.

Thanks,
Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>         * config/rs6000/rs6000.c (rs6000_function_arg): If a float argument
>         does not fit fully into floating-point registers, and there is still
>         space in the register parameter area, use GPRs to pass those parts
>         of the argument.
>         (rs6000_arg_partial_bytes): Likewise.
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 212100)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -10227,6 +10227,7 @@
>           rtx r, off;
>           int i, k = 0;
>           unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7) >> 3;
> +         int fpr_words;
>
>           /* Do we also need to pass this argument in the parameter
>              save area?  */
> @@ -10255,6 +10256,37 @@
>               rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
>             }
>
> +         /* If there were not enough FPRs to hold the argument, the rest
> +            usually goes into memory.  However, if the current position
> +            is still within the register parameter area, a portion may
> +            actually have to go into GPRs.
> +
> +            Note that it may happen that the portion of the argument
> +            passed in the first "half" of the first GPR was already
> +            passed in the last FPR as well.
> +
> +            For unnamed arguments, we already set up GPRs to cover the
> +            whole argument in rs6000_psave_function_arg, so there is
> +            nothing further to do at this point.  */
> +         fpr_words = (i * GET_MODE_SIZE (elt_mode)) / (TARGET_32BIT ? 4 : 8);
> +         if (i < n_elts && align_words + fpr_words < GP_ARG_NUM_REG
> +             && cum->nargs_prototype > 0)
> +            {
> +             enum machine_mode rmode = TARGET_32BIT ? SImode : DImode;
> +             int n_words = rs6000_arg_size (mode, type);
> +
> +             align_words += fpr_words;
> +             n_words -= fpr_words;
> +
> +             do
> +               {
> +                 r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words);
> +                 off = GEN_INT (fpr_words++ * GET_MODE_SIZE (rmode));
> +                 rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
> +               }
> +             while (++align_words < GP_ARG_NUM_REG && --n_words != 0);
> +           }
> +
>           return rs6000_finish_function_arg (mode, rvec, k);
>         }
>        else if (align_words < GP_ARG_NUM_REG)
> @@ -10330,8 +10362,23 @@
>        /* Otherwise, we pass in FPRs only.  Check for partial copies.  */
>        passed_in_gprs = false;
>        if (cum->fregno + n_elts * n_fpreg > FP_ARG_MAX_REG + 1)
> -       ret = ((FP_ARG_MAX_REG + 1 - cum->fregno)
> -              * MIN (8, GET_MODE_SIZE (elt_mode)));
> +       {
> +         /* Compute number of bytes / words passed in FPRs.  If there
> +            is still space available in the register parameter area
> +            *after* that amount, a part of the argument will be passed
> +            in GPRs.  In that case, the total amount passed in any
> +            registers is equal to the amount that would have been passed
> +            in GPRs if everything were passed there, so we fall back to
> +            the GPR code below to compute the appropriate value.  */
> +         int fpr = ((FP_ARG_MAX_REG + 1 - cum->fregno)
> +                    * MIN (8, GET_MODE_SIZE (elt_mode)));
> +         int fpr_words = fpr / (TARGET_32BIT ? 4 : 8);
> +
> +         if (align_words + fpr_words < GP_ARG_NUM_REG)
> +           passed_in_gprs = true;
> +         else
> +           ret = fpr;
> +       }
>      }
>
>    if (passed_in_gprs
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   ulrich.weig...@de.ibm.com
>

Reply via email to