Issue |
101624
|
Summary |
[LoongArch][Clang] Miss optimization for code mixed with replgr2vr and insgr2vr
|
Labels |
clang
|
Assignees |
|
Reporter |
24bit-xjkp
|
## Information
### clang
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 73c72f2c6505d5bc8b47bb0420f6cba5b24270fe)
Target: x86_64-unknown-windows-gnu
Thread model: posix
### gcc
Using built-in specs.
COLLECT_GCC=D:\Tools\loongarch64-linux-gnu\bin\loongarch64-linux-gnu-g++.exe
COLLECT_LTO_WRAPPER=D:/Tools/loongarch64-linux-gnu/bin/../libexec/gcc/loongarch64-linux-gnu/15.0.0/lto-wrapper.exe
Target: loongarch64-linux-gnu
Configured with: ../configure --disable-werror --prefix=/home/luo/x86_64-w64-mingw32-host-loongarch64-linux-gnu-target-gcc15 --host=x86_64-w64-mingw32 --target=loongarch64-linux-gnu --disable-multilib --enable-languages=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240714 (experimental) (GCC)
## Code
```c++
#include <cstdint>
#include <lasxintrin.h>
auto f(::std::uint64_t a, ::std::uint64_t b) noexcept
{
auto v{__lasx_xvreplgr2vr_d(a)};
v = __lasx_xvinsgr2vr_d(v, b, 2);
return v;
}
```
## Assembly
```asm
# clang++ --target=loongarch64-linux-gnu -O3 -march=la664 -S
_Z1fmm:
xvinsgr2vr.d $xr0, $a1, 0
xvinsgr2vr.d $xr0, $a1, 1
xvinsgr2vr.d $xr0, $a2, 2
xvinsgr2vr.d $xr0, $a1, 3
xvst $xr0, $a0, 0
ret
# loongarch64-linux-gnu-g++ -O3 -march=la664 -S
_Z1fmm:
xvreplgr2vr.d $xr0,$r5
xvinsgr2vr.d $xr0,$r6,2
xvst $xr0,$r4,0
jr $r1
```
Broadcast the general register `a` to the vector register `v` then insert another general register `b` into `v` is more efficient, with less instructions and lower latency.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs