Re: Complex numbers support: discussions summary

Toon Moene Tue, 17 Oct 2023 13:37:32 -0700

Sylvain,

Is this on a branch in your github repository


        https://github.com/kalray/gcc

somewhere ?

That would make it easier to test it for me (and probably others).

See for instance my mail here (d.d. Thu Oct 5 14:45:05 GMT 2023):

https://gcc.gnu.org/pipermail/gcc/2023-October/242643.html

Thanks in advance.

Kind regards,

Toon Moene.

On 10/16/23 11:14, Sylvain Noiry via Gcc wrote:

Hi,
We are trying to update our patches on complex numbers to take intoaccount what has been discussed.
The main change from our previous patches consists of replacing vectorsof complex types with classical vectors of real types (ex V4SF insteadof V2SC) associated with existing complex opcodes (like .COMPLEX_MUL)when vectorizing. Non vectored complex modes are also replaced byvectors of two reals at the end of the middle-end (ex SC to V2SF), sothat it can reuse already existing patterns. Indeed, non complexspecific operations like an addition does not require an specificpattern anymore, and already implementing patterns like cmul, cmul_conj,cadd90,... can be used.
To do so, the cplxlower pass has been cut into two passes:
- The first one replace complex specific opcodes with dedicatedopcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complexmodes are kept at this point. Unsupported native operations are alsolowered, because we assume that it's better to lower and hope forstandard optimizations in the middle-end than trying to vectorize withnear-zero chance, and then lower only after. - The second one almost only remaps non vectored complex modes intovector of two reals (like SC to V2SF).
So the vectorizer takes complex modes as input but vectorize withvectors of real modes (ex V4SF vector mode for SC). Because complexspecific opcodes have been set before, no confusion with real operationsis possible. We also may use vectors of two reals as inputs, butvectorizing small vector modes into bigger ones (like V2SF to V4SF) isnot possible.
Here are some advantages of this new approach:
   - No more vectors of complex modes
- The vectorization of complex operations is improved, because splitand unified vectored statements can easely be mixed as it uses the samevector type. We can also imagine to test multiple options (First: nativevectored, second: split vectored, third: unified scalar,...). - It reuses patterns for vectors of two reals for non complexspecific operations, and also already existing complex patterns likecmul implemented on aarch64, which could mean almost free performancegains on many targets.
On the performance side, we can still exploit the full potential ofcomplex instructions on KVX. To illustrate the gains on aarch64 withoutrewriting any patterns (except a mov), here is the assembly generatedfor a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and withoutffast-math like with SLP):
void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],
_Complex float c[restrict N], _Complex floatd[restrict N])
{
   for (int i = 0; i < N; i++)
     c[i] += a[i] * b[i] * d[i];
}


vfmma:
         movi    v3.4s, 0
         mov     x4, 0
         .align  5
.L2:
         ldr     q2, [x1, x4]
         mov     v1.16b, v3.16b
         ldr     q0, [x0, x4]
         fcmla   v1.4s, v0.4s, v2.4s, #0
         fcmla   v1.4s, v0.4s, v2.4s, #90
         ldr     q0, [x2, x4]
         ldr     q2, [x3, x4]
         fcmla   v0.4s, v2.4s, v1.4s, #0
         fcmla   v0.4s, v2.4s, v1.4s, #90
         str     q0, [x2, x4]
         add     x4, x4, 16
         cmp     x4, 256
         bne     .L2
         ret
We have only done some experimentation with this approach. If you thinkthat it could be interesting we will try to develop it more.
Thanks,

Sylvain


--
Toon Moene - e-mail: [email protected] - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Re: Complex numbers support: discussions summary

Reply via email to