| Issue |
181514
|
| Summary |
[AArch64] manual deinterleaving `ld2` not recognized
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
folkertdev
|
A manual 16-bit ld4 (so normal load, then deinterleave with a shuffle) is recognized, and lowered as `ld4`. The same is for some odd reason not true for `ld2`, where more instructions are used.
https://godbolt.org/z/danjGfMb9
```asm
manual2:
ldr q1, [x0]
ext v2.16b, v1.16b, v1.16b, #8
uzp1 v0.4h, v1.4h, v2.4h
uzp2 v1.4h, v1.4h, v2.4h
ret
intrin2:
ld2 { v0.4h, v1.4h }, [x0]
ret
manual4:
ld4 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
stp d0, d1, [x8]
stp d2, d3, [x8, #16]
ret
intrin4:
ld4 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
stp d0, d1, [x8]
stp d2, d3, [x8, #16]
ret
```
The issue is that the `VectorCombinePass` turns
```llvm
%0 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%1 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%2 = bitcast <4 x i16> %0 to <8 x i8>
%3 = bitcast <4 x i16> %1 to <8 x i8>
```
into
```llvm
%0 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
%1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 8, i32 9, i32 12, i32 13>
%2 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
%3 = shufflevector <16 x i8> %2, <16 x i8> poison, <8 x i32> <i32 2, i32 3, i32 6, i32 7, i32 10, i32 11, i32 14, i32 15>
```
that presumably breaks the `ld2` pattern recognition.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs