Issue 101624
Summary [LoongArch][Clang] Miss optimization for code mixed with replgr2vr and insgr2vr
Labels clang
Assignees
Reporter 24bit-xjkp
    ## Information

### clang

clang version 20.0.0git (https://github.com/llvm/llvm-project.git 73c72f2c6505d5bc8b47bb0420f6cba5b24270fe)
Target: x86_64-unknown-windows-gnu
Thread model: posix

### gcc

Using built-in specs.
COLLECT_GCC=D:\Tools\loongarch64-linux-gnu\bin\loongarch64-linux-gnu-g++.exe
COLLECT_LTO_WRAPPER=D:/Tools/loongarch64-linux-gnu/bin/../libexec/gcc/loongarch64-linux-gnu/15.0.0/lto-wrapper.exe
Target: loongarch64-linux-gnu
Configured with: ../configure --disable-werror --prefix=/home/luo/x86_64-w64-mingw32-host-loongarch64-linux-gnu-target-gcc15 --host=x86_64-w64-mingw32 --target=loongarch64-linux-gnu --disable-multilib --enable-languages=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240714 (experimental) (GCC)

## Code

```c++
#include <cstdint>
#include <lasxintrin.h>
auto f(::std::uint64_t a, ::std::uint64_t b) noexcept
{
    auto v{__lasx_xvreplgr2vr_d(a)};
    v = __lasx_xvinsgr2vr_d(v, b, 2);
    return v;
}
```

## Assembly

```asm
# clang++ --target=loongarch64-linux-gnu -O3 -march=la664 -S
_Z1fmm:
	xvinsgr2vr.d	$xr0, $a1, 0
	xvinsgr2vr.d	$xr0, $a1, 1
	xvinsgr2vr.d	$xr0, $a2, 2
	xvinsgr2vr.d	$xr0, $a1, 3
	xvst	$xr0, $a0, 0
	ret
# loongarch64-linux-gnu-g++ -O3 -march=la664 -S
_Z1fmm:
	xvreplgr2vr.d	$xr0,$r5
	xvinsgr2vr.d	$xr0,$r6,2
	xvst	$xr0,$r4,0
	jr	$r1
```

Broadcast the general register `a` to the vector register `v` then insert another general register `b` into `v` is more efficient, with less instructions and lower latency.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to