| Issue |
172807
|
| Summary |
Improve vectorization of complex multiplication using shuffle-based ADDSUB pattern
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
ParkHanbum
|
I would like to propose an optimization for complex number multiplication patterns. Currently, the vectorizer (or scalar optimizer) often leaves complex multiplication as a sequence of extractelement, scalar arithmetic, and insertelement.
I suggest transforming this pattern into a vector-friendly form using splats and shuffles, which maps efficiently to hardware instructions like x86 vaddsubpd.
Current Behavior (Scalar/Naive)
The mathematical representation of complex multiplication (a0 + a1*i) * (b0 + b1*i) is currently represented as:
```
Real: a0*b0 - a1*b1
Imag: a0*b1 + a1*b0
Result: { Real, Imag }
```
In LLVM IR, this often results in a chain of scalar operations requiring costly element extraction and insertion.
Proposed Behavior (Vectorized)
We can vectorize this operation by preparing two intermediate vectors (sa and sb) and combining them using a subtraction/addition mix.
Given input vectors a = {a0, a1} and b = {b0, b1}:
Generate sa (Real Splat * Original):
Splat a0 (Real part of A) -> {a0, a0}
Multiply by b -> {b0, b1}
sa = {a0*b0, a0*b1}
Generate sb (Imag Splat * Swapped):
Splat a1 (Imag part of A) -> {a1, a1}
Swap b -> {b1, b0}
Multiply them
sb = {a1*b1, a1*b0}
Combine (ADDSUB semantics):
Perform subtraction on the even index (Real) and addition on the odd index (Imag).
Result = { sa[0] - sb[0], sa[1] + sb[1] }
Summary of Logic
```
Previous:
{ a0*b0 - a1*b1 , a0*b1 + a1*b0 }
Suggest:
sa = {a0*b0, a0*b1}
sb = {a1*b1, a1*b0}
Result = { sa[0] - sb[0], sa[1] + sb[1] }
```
example:
https://rust.godbolt.org/z/PTo8T1EjG
extra:
performance analyse
Suggest :
```
Iterations: 100
Instructions: 700
Total Cycles: 209
Total uOps: 800
Dispatch Width: 6
uOps Per Cycle: 3.83
IPC: 3.35
Block RThroughput: 1.5
```
before LLVM13:
```
Iterations: 100
Instructions: 700
Total Cycles: 804
Total uOps: 800
Dispatch Width: 6
uOps Per Cycle: 1.00
IPC: 0.87
Block RThroughput: 1.5
```
associate : [#58139](https://github.com/llvm/llvm-project/issues/58139)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs