Issue |
137040
|
Summary |
AArch64 SVE: Multiple ptrue instructions not merged
|
Labels |
backend:AArch64
|
Assignees |
|
Reporter |
MatzeB
|
SVE code operating with different vector types has a tendency to produce multiple `ptrue` instructions.
For example I got this from an internal user:
```
#include <arm_sve.h>
svuint32_t getSveVec(const uint32_t* inputPtr) {
svuint64_t vec = svld1uw_u64(svptrue_b64(), inputPtr);
svuint32_t clzV1 = svclz_u32_x(svptrue_b32(), svreinterpret_u32_u64(vec));
return clzV1;
}
```
Producing something like this:
```
$ clang++ -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16 -O3 -S -o - dup2.cpp
...
ptrue p0.d
ld1w { z0.d }, p0/z, [x0]
ptrue p0.s
clz z0.s, p0/m, z0.s
ret
```
My understanding is that a `ptrue p0.b` would suffice here and in fact GCC is producing that code.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs