Tamar Christina <tamar.christ...@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 97da60762390db81df9cffaf316b909cd1609130..9cc8da338125afa01bc9fb645f4112d2d7ef548c
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11279,6 +11279,14 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, 
> int outer, bool speed)
>    if (VECTOR_MODE_P (mode))
>      mode = GET_MODE_INNER (mode);
> 
> +  /* The by element versions of the instruction has the same costs as the
> +     normal 3 vector version.  So don't add the costs of the duplicate into
> +     the costs of the multiply.  */
> +  if (GET_CODE (op0) == VEC_DUPLICATE)
> +    op0 = XEXP (op0, 0);
> +  else if (GET_CODE (op1) == VEC_DUPLICATE)
> +    op1 = XEXP (op1, 0);
> +
>    /* Integer multiply/fma.  */
>    if (GET_MODE_CLASS (mode) == MODE_INT)
>      {

SVE doesn't have duplicating forms, so I think we should put this code
under the “if (VECTOR_MODE_P (mode))” condition, before changing “mode”,
and then restrict it to VEC_ADVSIMD modes.

(SVE FMUL does have an indexed form, but the index is relative to the
start of the associated quadword, so it isn't a VEC_DUPLICATE.)

I guess there's a danger that this could underestimate the cost for
integer modes, if the scalar integer input needs to be moved from GPRs.
In that case the cost of a MULT + VEC_DUPLICATE is probably more
accurate, even though it's still one instruction before RA.

But I guess there's no perfect answer there.  The new code will be
right for integer modes in some cases and not in others.  Same if
we leave things as they are.  But maybe it'd be worth having a comment
to say that we're assuming the best case, i.e. that the duplicated
value is naturally in FPRs?

Thanks,
Richard

Reply via email to