Hi! I try to investigate on this problem, and modify the testcase to
compile and run on aarch64 for reference, but I get some strange result
(comment shows the info that I see by stepping through by using gdb):

typedef double __attribute__((vector_size(16))) v2df;

void use1(double d) {}

__attribute__((noipa)) v2df use(double d)
{
  //reg v8's value: {1, 2}
  register v2df x asm("v8") = {5, 9};
  //reg v8's value: {5, 9}
  __asm__("" : "+w" (x));
  return x;
}

void test(void)
{
  register v2df x asm("v8") = {1, 2};
  __asm__("" : "+w" (x));
  //reg v8's value: {1, 2}
  use(x[0]);
  //reg v8's value: {1, 0}
  use1(x[1]);
}

int main(int argc, char **argv)
{
  test();
  return 0;
}

The compile command is: gcc -march=armv8-a -Og -g 1.c (gcc
8.3.0+binutils 2.31)

Disassembly of test() and use():
0000000000400558 <use>:                                               
  400558:       fc1f0fe8        str     d8, [sp, #-16]!               
  40055c:       90000000        adrp    x0, 400000 <_init-0x3e0>      
  400560:       3dc19c08        ldr     q8, [x0, #1648]               
  400564:       4ea81d00        mov     v0.16b, v8.16b                
  400568:       fc4107e8        ldr     d8, [sp], #16                 
  40056c:       d65f03c0        ret    

0000000000400570 <test>:                                              
  400570:       a9be7bfd        stp     x29, x30, [sp, #-32]!         
  400574:       910003fd        mov     x29, sp                       
  400578:       fd000be8        str     d8, [sp, #16]                 
  40057c:       90000000        adrp    x0, 400000 <_init-0x3e0>      
  400580:       3dc1a008        ldr     q8, [x0, #1664]               
  400584:       5e080500        mov     d0, v8.d[0]                   
  400588:       97fffff4        bl      400558 <use>                  
  40058c:       fd400be8        ldr     d8, [sp, #16]                 
  400590:       a8c27bfd        ldp     x29, x30, [sp], #32           
  400594:       d65f03c0        ret     

As the register value in the comments, The compiling output on aarch64
also clobbers the high parts of vector register. I googled for some
documents and I find this:
https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64
-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard
/Parameters-in-NEON-and-floating-point-registers 

Seems ARMv8-A only guarantees to preserve low 64-bit value of
NEON/floating-point register value. I'm not sure that I modify the
testcase in the right way and maybe we need more investigations. Any
ideas or suggestion?

On Wed, 2023-08-16 at 11:27 +0800, Xi Ruoyao wrote:
> The implementation fails to handle this test case properly:
> 
> typedef double __attribute__((vector_size(32))) v4df;
> 
> void use1(double);
> 
> __attribute__((noipa)) double use(double)
> {
>       register double x asm("f24") = 114.514;
>       __asm__("" : "+f" (x));
>       return x;
> }
> 
> void test(void)
> {
>       register v4df x asm("f24") = {1, 2, 3, 4};
>       __asm__("" : "+f" (x));
>       use(x[1]);
>       use1(x[3]);
> }
> 
> Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
> clobbering the high 192 bits of xr24.  Now test() passes a wrong
> value
> of x[3] to use1().
> 
> Note that saving and restoring f24 with xvst/xvld in use() won't
> really
> fix the issue because in real life use() can be in another
> translation
> unit (or even a shared library) compiled with -mno-lsx.  So it seems
> we
> need to tell the compiler "a function call may clobber the high bits
> of
> a vector register even if the corresponding floating-point register
> is
> saved".  I'm not sure how to accomplish this...
> 
> On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> > This is an update of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> > 
> > This version of patch set only introduces some small simplications
> > of
> > implementation. Because I missed the size limitation of mail size,
> > the
> > huge testsuite patches of v2 and v3 are not shown in the mail list.
> > So,
> > testsuite patches are splited from this patch set again and will be
> > submitted 
> > independently in the future.
> > 
> > Binutils-gdb introduced LSX/LASX support since 2.41 release:
> > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00009.html
> > 
> > Brief history of patch set version:
> > v1 -> v2:
> > - Reduce usage of "unspec" in RTL template.
> > - Append Support of ADDR_REG_REG in LSX and LASX.
> > - Constraint docs are appended in gcc/doc/md.texi and ccomment
> > block.
> > - Codes related to vecarg are removed.
> > - Testsuite of LSX and LASX is added in v2. (Because of the size
> > limitation of
> >   mail list, these patches are not shown)
> > - Adjust the loongarch_expand_vector_init() function to reduce
> > instruction 
> >   output amount.
> > - Some minor implementation changes of RTL templates.
> > 
> > v2 -> v3:
> > - Revert vabsd/xvabsd RTL templates to unspec impl.
> > - Resolve warning in gcc/config/loongarch/loongarch.cc when
> > bootstrapping 
> >   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -
> > mlasx".
> > - Remove redundant definitions in lasxintrin.h.
> > - Refine commit info.
> > 
> > Lulu Cheng (6):
> >   LoongArch: Add Loongson SX vector directive compilation
> > framework.
> >   LoongArch: Add Loongson SX base instruction support.
> >   LoongArch: Add Loongson SX directive builtin function support.
> >   LoongArch: Add Loongson ASX vector directive compilation
> > framework.
> >   LoongArch: Add Loongson ASX base instruction support.
> >   LoongArch: Add Loongson ASX directive builtin function support.
> > 
> >  gcc/config.gcc                                |    2 +-
> >  gcc/config/loongarch/constraints.md           |  131 +-
> >  .../loongarch/genopts/loongarch-strings       |    4 +
> >  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
> >  gcc/config/loongarch/lasx.md                  | 5122
> > ++++++++++++++++
> >  gcc/config/loongarch/lasxintrin.h             | 5338
> > +++++++++++++++++
> >  gcc/config/loongarch/loongarch-builtins.cc    | 2686 ++++++++-
> >  gcc/config/loongarch/loongarch-c.cc           |   18 +
> >  gcc/config/loongarch/loongarch-def.c          |    6 +
> >  gcc/config/loongarch/loongarch-def.h          |    9 +-
> >  gcc/config/loongarch/loongarch-driver.cc      |   10 +
> >  gcc/config/loongarch/loongarch-driver.h       |    2 +
> >  gcc/config/loongarch/loongarch-ftypes.def     |  666 +-
> >  gcc/config/loongarch/loongarch-modes.def      |   39 +
> >  gcc/config/loongarch/loongarch-opts.cc        |   89 +-
> >  gcc/config/loongarch/loongarch-opts.h         |    3 +
> >  gcc/config/loongarch/loongarch-protos.h       |   35 +
> >  gcc/config/loongarch/loongarch-str.h          |    3 +
> >  gcc/config/loongarch/loongarch.cc             | 4586
> > +++++++++++++-
> >  gcc/config/loongarch/loongarch.h              |  117 +-
> >  gcc/config/loongarch/loongarch.md             |   56 +-
> >  gcc/config/loongarch/loongarch.opt            |   12 +-
> >  gcc/config/loongarch/lsx.md                   | 4481
> > ++++++++++++++
> >  gcc/config/loongarch/lsxintrin.h              | 5181
> > ++++++++++++++++
> >  gcc/config/loongarch/predicates.md            |  333 +-
> >  gcc/doc/md.texi                               |   11 +
> >  26 files changed, 28668 insertions(+), 284 deletions(-)
> >  create mode 100644 gcc/config/loongarch/lasx.md
> >  create mode 100644 gcc/config/loongarch/lasxintrin.h
> >  create mode 100644 gcc/config/loongarch/lsx.md
> >  create mode 100644 gcc/config/loongarch/lsxintrin.h
> > 
> 

Reply via email to