Tamar Christina <tamar.christ...@arm.com> writes: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6132,6 +6132,17 @@ floating-point mode. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern > +@item @samp{cadd@var{m}@var{n}3} > +Perform a vector addition of complex numbers in operand 1 with operand 2 > +rotated by @var{m} degrees around the argand plane and storing the result in > +operand 0. The instruction must perform the operation on data loaded > +contiguously into the vectors.
Nitpicking, sorry, but I think it would be better to describe the layout directly rather than in terms of loads, since the preceding operation might not be a load. I guess the main question is: what representation do we expect for big-endian? A normal Advanced SIMD LDR would give this (for floats): MEMORY +-----+-----+-----+-----+ | r0 | i0 | r1 | i1 | +-----+-----+-----+-----+ | 0 | 1 | 2 | 3 | array numbering +-----+-----+-----+-----+ V V V V Advanced SIMD LDR +-----+-----+-----+-----+ | r0 | i0 | r1 | i1 | +-----+-----+-----+-----+ | 0 | 1 | 2 | 3 | GCC lane numbering +-----+-----+-----+-----+ | 3 | 2 | 1 | 0 | Arm lane numbering +-----+-----+-----+-----+ MSB REGISTER LSB but the FC* instructions put the imaginary parts in the more significant lane, so the pairs of elements above would need to be reversed: MEMORY +-----+-----+-----+-----+ | r0 | i0 | r1 | i1 | +-----+-----+-----+-----+ | 0 | 1 | 2 | 3 | array numbering +-----+-----+-----+-----+ \ / \ / \ / \ / X X Load and permute / \ / \ / \ / \ +-----+-----+-----+-----+ | i0 | r0 | i1 | r1 | +-----+-----+-----+-----+ | 0 | 1 | 2 | 3 | GCC lane numbering +-----+-----+-----+-----+ | 3 | 2 | 1 | 0 | Arm lane numbering +-----+-----+-----+-----+ MSB REGISTER LSB (Or the whole vector could be reversed.) We might decide that it just isn't worth doing this for Advanced SIMD. But should the semantics of the optab be that: (1) GCC lane number 0 holds a real part, or (2) the least significant lane holds a real part? With (1), it would be up to the target to hide the permute above. With (2), the vectoriser would need to introduce the permute itself. I'm not sure there's a perfect answer even for Arm targets. (2) matches the Advanced SIMD semantics. But for SVE, the register layout follows LD1 rather than LDR, and the GCC and architectural lane numbering match up. (1) would therefore be better than (2) for SVE (and so no permute would be needed for either endianness on SVE). > +The operation is only supported for vector modes @var{n} and with > +rotations @var{m} of 90 or 270. > + > +This pattern is not allowed to @code{FAIL}. > + > @cindex @code{ffs@var{m}2} instruction pattern > @item @samp{ffs@var{m}2} > Store into operand 0 one plus the index of the least significant 1-bit > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index > 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 > 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary) > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index > 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d > 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") > OPTAB_D (atanh_optab, "atanh$a2") > OPTAB_D (copysign_optab, "copysign$F$a3") > OPTAB_D (xorsign_optab, "xorsign$F$a3") > +OPTAB_D (cadd90_optab, "cadd90$a3") > +OPTAB_D (cadd270_optab, "cadd270$a3") > OPTAB_D (cos_optab, "cos$a2") > OPTAB_D (cosh_optab, "cosh$a2") > OPTAB_D (exp10_optab, "exp10$a2") > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index > 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 > 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -663,12 +663,94 @@ graceful_exit: > } > }; > > +class ComplexAddPattern : public ComplexPattern Another nitpick, sorry, but type names should be lower case rather than CamelCase. Thanks, Richard