Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

Richard Sandiford Tue, 29 Sep 2020 03:02:49 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6132,6 +6132,17 @@ floating-point mode.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> +@item @samp{cadd@var{m}@var{n}3}
> +Perform a vector addition of complex numbers in operand 1 with operand 2
> +rotated by @var{m} degrees around the argand plane and storing the result in
> +operand 0.  The instruction must perform the operation on data loaded
> +contiguously into the vectors.


Nitpicking, sorry, but I think it would be better to describe the
layout directly rather than in terms of loads, since the preceding
operation might not be a load.

I guess the main question is: what representation do we expect for
big-endian?  A normal Advanced SIMD LDR would give this (for floats):

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
      V     V     V     V      Advanced SIMD LDR
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

but the FC* instructions put the imaginary parts in the more
significant lane, so the pairs of elements above would need
to be reversed:

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
       \   /       \   /
        \ /         \ /
         X           X         Load and permute
        / \         / \
       /   \       /   \
   +-----+-----+-----+-----+
   | i0  | r0  | i1  | r1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

(Or the whole vector could be reversed.)

We might decide that it just isn't worth doing this for Advanced SIMD.
But should the semantics of the optab be that:

(1) GCC lane number 0 holds a real part, or
(2) the least significant lane holds a real part?

With (1), it would be up to the target to hide the permute above.
With (2), the vectoriser would need to introduce the permute itself.

I'm not sure there's a perfect answer even for Arm targets.  (2) matches
the Advanced SIMD semantics.  But for SVE, the register layout follows
LD1 rather than LDR, and the GCC and architectural lane numbering match up.
(1) would therefore be better than (2) for SVE (and so no permute would be
needed for either endianness on SVE).

> +The operation is only supported for vector modes @var{n} and with
> +rotations @var{m} of 90 or 270.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{ffs@var{m}2} instruction pattern
>  @item @samp{ffs@var{m}2}
>  Store into operand 0 one plus the index of the least significant 1-bit
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
>  
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
>  OPTAB_D (atanh_optab, "atanh$a2")
>  OPTAB_D (copysign_optab, "copysign$F$a3")
>  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> +OPTAB_D (cadd90_optab, "cadd90$a3")
> +OPTAB_D (cadd270_optab, "cadd270$a3")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -663,12 +663,94 @@ graceful_exit:
>      }
>  };
>  
> +class ComplexAddPattern : public ComplexPattern

Another nitpick, sorry, but type names should be lower case rather than
CamelCase.

Thanks,
Richard

Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

Reply via email to