Hi! I try to investigate on this problem, and modify the testcase to compile and run on aarch64 for reference, but I get some strange result (comment shows the info that I see by stepping through by using gdb):
typedef double __attribute__((vector_size(16))) v2df; void use1(double d) {} __attribute__((noipa)) v2df use(double d) { //reg v8's value: {1, 2} register v2df x asm("v8") = {5, 9}; //reg v8's value: {5, 9} __asm__("" : "+w" (x)); return x; } void test(void) { register v2df x asm("v8") = {1, 2}; __asm__("" : "+w" (x)); //reg v8's value: {1, 2} use(x[0]); //reg v8's value: {1, 0} use1(x[1]); } int main(int argc, char **argv) { test(); return 0; } The compile command is: gcc -march=armv8-a -Og -g 1.c (gcc 8.3.0+binutils 2.31) Disassembly of test() and use(): 0000000000400558 <use>: 400558: fc1f0fe8 str d8, [sp, #-16]! 40055c: 90000000 adrp x0, 400000 <_init-0x3e0> 400560: 3dc19c08 ldr q8, [x0, #1648] 400564: 4ea81d00 mov v0.16b, v8.16b 400568: fc4107e8 ldr d8, [sp], #16 40056c: d65f03c0 ret 0000000000400570 <test>: 400570: a9be7bfd stp x29, x30, [sp, #-32]! 400574: 910003fd mov x29, sp 400578: fd000be8 str d8, [sp, #16] 40057c: 90000000 adrp x0, 400000 <_init-0x3e0> 400580: 3dc1a008 ldr q8, [x0, #1664] 400584: 5e080500 mov d0, v8.d[0] 400588: 97fffff4 bl 400558 <use> 40058c: fd400be8 ldr d8, [sp, #16] 400590: a8c27bfd ldp x29, x30, [sp], #32 400594: d65f03c0 ret As the register value in the comments, The compiling output on aarch64 also clobbers the high parts of vector register. I googled for some documents and I find this: https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64 -bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard /Parameters-in-NEON-and-floating-point-registers Seems ARMv8-A only guarantees to preserve low 64-bit value of NEON/floating-point register value. I'm not sure that I modify the testcase in the right way and maybe we need more investigations. Any ideas or suggestion? On Wed, 2023-08-16 at 11:27 +0800, Xi Ruoyao wrote: > The implementation fails to handle this test case properly: > > typedef double __attribute__((vector_size(32))) v4df; > > void use1(double); > > __attribute__((noipa)) double use(double) > { > register double x asm("f24") = 114.514; > __asm__("" : "+f" (x)); > return x; > } > > void test(void) > { > register v4df x asm("f24") = {1, 2, 3, 4}; > __asm__("" : "+f" (x)); > use(x[1]); > use1(x[3]); > } > > Here use() attempts to save and restore f24, but it uses fst.d/fld.d, > clobbering the high 192 bits of xr24. Now test() passes a wrong > value > of x[3] to use1(). > > Note that saving and restoring f24 with xvst/xvld in use() won't > really > fix the issue because in real life use() can be in another > translation > unit (or even a shared library) compiled with -mno-lsx. So it seems > we > need to tell the compiler "a function call may clobber the high bits > of > a vector register even if the corresponding floating-point register > is > saved". I'm not sure how to accomplish this... > > On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote: > > This is an update of: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html > > > > This version of patch set only introduces some small simplications > > of > > implementation. Because I missed the size limitation of mail size, > > the > > huge testsuite patches of v2 and v3 are not shown in the mail list. > > So, > > testsuite patches are splited from this patch set again and will be > > submitted > > independently in the future. > > > > Binutils-gdb introduced LSX/LASX support since 2.41 release: > > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html > > > > Brief history of patch set version: > > v1 -> v2: > > - Reduce usage of "unspec" in RTL template. > > - Append Support of ADDR_REG_REG in LSX and LASX. > > - Constraint docs are appended in gcc/doc/md.texi and ccomment > > block. > > - Codes related to vecarg are removed. > > - Testsuite of LSX and LASX is added in v2. (Because of the size > > limitation of > > mail list, these patches are not shown) > > - Adjust the loongarch_expand_vector_init() function to reduce > > instruction > > output amount. > > - Some minor implementation changes of RTL templates. > > > > v2 -> v3: > > - Revert vabsd/xvabsd RTL templates to unspec impl. > > - Resolve warning in gcc/config/loongarch/loongarch.cc when > > bootstrapping > > with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model - > > mlasx". > > - Remove redundant definitions in lasxintrin.h. > > - Refine commit info. > > > > Lulu Cheng (6): > > LoongArch: Add Loongson SX vector directive compilation > > framework. > > LoongArch: Add Loongson SX base instruction support. > > LoongArch: Add Loongson SX directive builtin function support. > > LoongArch: Add Loongson ASX vector directive compilation > > framework. > > LoongArch: Add Loongson ASX base instruction support. > > LoongArch: Add Loongson ASX directive builtin function support. > > > > gcc/config.gcc | 2 +- > > gcc/config/loongarch/constraints.md | 131 +- > > .../loongarch/genopts/loongarch-strings | 4 + > > gcc/config/loongarch/genopts/loongarch.opt.in | 12 +- > > gcc/config/loongarch/lasx.md | 5122 > > ++++++++++++++++ > > gcc/config/loongarch/lasxintrin.h | 5338 > > +++++++++++++++++ > > gcc/config/loongarch/loongarch-builtins.cc | 2686 ++++++++- > > gcc/config/loongarch/loongarch-c.cc | 18 + > > gcc/config/loongarch/loongarch-def.c | 6 + > > gcc/config/loongarch/loongarch-def.h | 9 +- > > gcc/config/loongarch/loongarch-driver.cc | 10 + > > gcc/config/loongarch/loongarch-driver.h | 2 + > > gcc/config/loongarch/loongarch-ftypes.def | 666 +- > > gcc/config/loongarch/loongarch-modes.def | 39 + > > gcc/config/loongarch/loongarch-opts.cc | 89 +- > > gcc/config/loongarch/loongarch-opts.h | 3 + > > gcc/config/loongarch/loongarch-protos.h | 35 + > > gcc/config/loongarch/loongarch-str.h | 3 + > > gcc/config/loongarch/loongarch.cc | 4586 > > +++++++++++++- > > gcc/config/loongarch/loongarch.h | 117 +- > > gcc/config/loongarch/loongarch.md | 56 +- > > gcc/config/loongarch/loongarch.opt | 12 +- > > gcc/config/loongarch/lsx.md | 4481 > > ++++++++++++++ > > gcc/config/loongarch/lsxintrin.h | 5181 > > ++++++++++++++++ > > gcc/config/loongarch/predicates.md | 333 +- > > gcc/doc/md.texi | 11 + > > 26 files changed, 28668 insertions(+), 284 deletions(-) > > create mode 100644 gcc/config/loongarch/lasx.md > > create mode 100644 gcc/config/loongarch/lasxintrin.h > > create mode 100644 gcc/config/loongarch/lsx.md > > create mode 100644 gcc/config/loongarch/lsxintrin.h > > >