Re: Complex numbers support: discussions summary

Sylvain Noiry via Gcc Wed, 18 Oct 2023 00:25:45 -0700

Hello Toon,

the implementation is not finished, we have just made some tests for now.

If no one sees huge problems with this new approach, we will continue toimplement and stabilize it.


Thank you for your interest !

Sylvain

On 10/17/23 22:37, Toon Moene wrote:

Sylvain,

Is this on a branch in your github repository

    https://github.com/kalray/gcc

somewhere ?

That would make it easier to test it for me (and probably others).

See for instance my mail here (d.d. Thu Oct 5 14:45:05 GMT 2023):

https://gcc.gnu.org/pipermail/gcc/2023-October/242643.html

Thanks in advance.

Kind regards,

Toon Moene.

On 10/16/23 11:14, Sylvain Noiry via Gcc wrote:
Hi,
We are trying to update our patches on complex numbers to take intoaccount what has been discussed.
The main change from our previous patches consists of replacingvectors of complex types with classical vectors of real types (exV4SF instead of V2SC) associated with existing complex opcodes (like.COMPLEX_MUL) when vectorizing. Non vectored complex modes are alsoreplaced by vectors of two reals at the end of the middle-end (ex SCto V2SF), so that it can reuse already existing patterns. Indeed,non complex specific operations like an addition does not require anspecific pattern anymore, and already implementing patterns likecmul, cmul_conj, cadd90,... can be used.
To do so, the cplxlower pass has been cut into two passes:
- The first one replace complex specific opcodes with dedicatedopcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), butcomplex modes are kept at this point. Unsupported native operationsare also lowered, because we assume that it's better to lower andhope for standard optimizations in the middle-end than trying tovectorize with near-zero chance, and then lower only after. - The second one almost only remaps non vectored complex modesinto vector of two reals (like SC to V2SF).
So the vectorizer takes complex modes as input but vectorize withvectors of real modes (ex V4SF vector mode for SC). Because complexspecific opcodes have been set before, no confusion with realoperations is possible. We also may use vectors of two reals asinputs, but vectorizing small vector modes into bigger ones (likeV2SF to V4SF) is not possible.
Here are some advantages of this new approach:
   - No more vectors of complex modes
- The vectorization of complex operations is improved, becausesplit and unified vectored statements can easely be mixed as it usesthe same vector type. We can also imagine to test multiple options(First: native vectored, second: split vectored, third: unifiedscalar,...). - It reuses patterns for vectors of two reals for non complexspecific operations, and also already existing complex patterns likecmul implemented on aarch64, which could mean almost free performancegains on many targets.
On the performance side, we can still exploit the full potential ofcomplex instructions on KVX. To illustrate the gains on aarch64without rewriting any patterns (except a mov), here is the assemblygenerated for a vector complex mul mul add with -O2 -mcpu=neoverse-v1(and without ffast-math like with SLP):
void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
_Complex float c[restrict N], _Complex floatd[restrict N])
{
   for (int i = 0; i < N; i++)
     c[i] += a[i] * b[i] * d[i];
}


vfmma:
         movi    v3.4s, 0
         mov     x4, 0
         .align  5
.L2:
         ldr     q2, [x1, x4]
         mov     v1.16b, v3.16b
         ldr     q0, [x0, x4]
         fcmla   v1.4s, v0.4s, v2.4s, #0
         fcmla   v1.4s, v0.4s, v2.4s, #90
         ldr     q0, [x2, x4]
         ldr     q2, [x3, x4]
         fcmla   v0.4s, v2.4s, v1.4s, #0
         fcmla   v0.4s, v2.4s, v1.4s, #90
         str     q0, [x2, x4]
         add     x4, x4, 16
         cmp     x4, 256
         bne     .L2
         ret
We have only done some experimentation with this approach. If youthink that it could be interesting we will try to develop it more.
Thanks,

Sylvain

Re: Complex numbers support: discussions summary

Reply via email to