Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> SVE's INDEX instruction can be used to populate vectors by values starting 
> from
> "base" and incremented by "step" for each subsequent value. We can take
> advantage of it to generate vector constants if TARGET_SVE is available and 
> the
> base and step values are within [-16, 15].
>
> For example, with the following function:
>
> typedef int v4si __attribute__ ((vector_size (16)));
> v4si
> f_v4si (void)
> {
>   return (v4si){ 0, 1, 2, 3 };
> }
>
> GCC currently generates:
>
> f_v4si:
>       adrp    x0, .LC4
>       ldr     q0, [x0, #:lo12:.LC4]
>       ret
>
> .LC4:
>       .word   0
>       .word   1
>       .word   2
>       .word   3
>
> With this patch, we generate an INDEX instruction instead if TARGET_SVE is
> available.
>
> f_v4si:
>       index   z0.s, #0, #1
>       ret
>
> [...]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9e12bd9711c..01bfb8c52e4 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
>    if (CONST_VECTOR_P (op)
>        && CONST_VECTOR_DUPLICATE_P (op))
>      n_elts = CONST_VECTOR_NPATTERNS (op);
> -  else if ((vec_flags & VEC_SVE_DATA)
> -        && const_vec_series_p (op, &base, &step))
> +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))

I think we need to check which == AARCH64_CHECK_MOV too.  (Previously that
wasn't necessary, because native SVE only uses this routine for moves.)

FTR: I was initially a bit nervous about testing TARGET_SVE without looking
at vec_flags at all.  But looking at the previous handling of predicates
and structures, I agree it looks like the correct thing to do.

>      {
>        gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
>        if (!aarch64_sve_index_immediate_p (base)
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> index 216699b0536..3d6a0160f95 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> @@ -10,7 +10,6 @@ dupq (int x)
>    return svdupq_s32 (x, 1, 2, 3);
>  }
>  
> -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
>  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
>  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} } } */
> -/* { dg-final { scan-assembler {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } 
> */

This seems to be a regression of sorts.  Previously we had:

        adrp    x1, .LC0
        ldr     q0, [x1, #:lo12:.LC0]
        ins     v0.s[0], w0
        dup     z0.q, z0.q[0]

whereas now we have:

        movi    v0.2s, 0x2
        index   z31.s, #1, #2
        ins     v0.s[0], w0
        zip1    v0.4s, v0.4s, v31.4s
        dup     z0.q, z0.q[0]

I think we should try to aim for:

        index   z0.s, #0, #1
        ins     v0.s[0], w0
        dup     z0.q, z0.q[0]

instead.

> [...]
> +/*
> +** g_v4si:
> +**   index   z0\.s, #3, #\-4

The backslash looks redundant here.

Thanks,
Richard

> +**   ret
> +*/
> +v4si
> +g_v4si (void)
> +{
> +  return (v4si){ 3, -1, -5, -9 };
> +}

Reply via email to