Thanks Jeff and Sam, updated v2 for -fno-strict-aliasing.

Pan

-----Original Message-----
From: Jeff Law <jeffreya...@gmail.com> 
Sent: Sunday, January 26, 2025 1:06 AM
To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp....@gmail.com; 
vine...@rivosinc.com; richard.sandif...@arm.com
Subject: Re: [PATCH v1] RISC-V: Make FRM as global register [PR118103] 
[PR118646]



On 1/25/25 1:07 AM, pan2...@intel.com wrote:
> From: Pan Li <pan2...@intel.com>
> 
> After we enabled the labe-combine pass after the mode-switching pass, it
> will try to combine below insn patterns into op.  Aka:
> 
> (insn 40 5 41 2 (set (reg:SI 11 a1 [151])
>    (reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
>    (nil))
> (insn 41 40 7 2 (set (reg:SI 69 frm)
>    (const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
>    (nil))
> (insn 42 10 11 2 (set (reg:SI 69 frm)
>    (reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
>      (nil))
> 
> trying to combine definition of r11 in:
> 40: a1:SI=frm:SI
>      into:
> 42: frm:SI=a1:SI
>      instruction becomes a no-op:
> (set (reg:SI 69 frm)
> (reg:SI 69 frm))
> original cost = 4 + 4 (weighted: 8.000000), replacement cost =
> 2147483647; keeping replacement
> rescanning insn with uid = 42.
> updating insn 42 in-place
> verify found no changes in insn with uid = 42.
> deleting insn 40
> 
> For example we have code as blow:
>     9   │ int test_exampe () {
>    10   │   test ();
>    11   │
>    12   │   size_t vl = 4;
>    13   │   vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
>    14   │   va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
>    15   │   va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
>    16   │
>    17   │   __riscv_vse16_v_f16m1(b, va, vl);
>    18   │
>    19   │   return 0;
>    20   │ }
> 
> it will be compiled to:
>    53   │ main:
>    54   │     addi    sp,sp,-16
>    55   │     sd  ra,8(sp)
>    56   │     call    initialize
>    57   │     lui a6,%hi(b)
>    58   │     lui a2,%hi(a)
>    59   │     addi    a3,a6,%lo(b)
>    60   │     addi    a2,a2,%lo(a)
>    61   │     li  a4,4
>    62   │ .L8:
>    63   │     fsrmi   2
>    64   │     vsetvli a5,a4,e16,m1,ta,ma
>    65   │     vle16.v v1,0(a2)
>    66   │     slli    a1,a5,1
>    67   │     subw    a4,a4,a5
>    68   │     add a2,a2,a1
>    69   │     vfnmadd.vv  v1,v1,v1
>    >> The fsrm a0 insn is deleted by late-combine <<
>    70   │     vfmsub.vv   v1,v1,v1
>    71   │     vse16.v v1,0(a3)
>    72   │     add a3,a3,a1
>    73   │     bgt a4,zero,.L8
>    74   │     lh  a4,%lo(b)(a6)
>    75   │     li  a5,-20480
>    76   │     addi    a5,a5,-1382
>    77   │     bne a4,a5,.L14
>    78   │     ld  ra,8(sp)
>    79   │     li  a0,0
>    80   │     addi    sp,sp,16
>    81   │     jr  ra
> 
> This patch would like to add the FRM register to the global_regs as it
> is a cooperatively-managed global register.  And then the fsrm insn will
> not be eliminated by late-combine.  The related spec17 cam4 failure may
> also caused by this issue too.
> 
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> 
>       PR target/118103
> 
> gcc/ChangeLog:
> 
>       * config/riscv/riscv.cc (riscv_conditional_register_usage): Add
>       the FRM as the global_regs.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/riscv/rvv/base/pr118103-1.c: New test.
>       * gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.
A bit surprised to hear that we've got vector code that explicitly sets 
the rounding mode.  We should keep that in mind as I wouldn't be at all 
surprised to find designs that flush pipes on the rounding mode change 
(much like we've seen with fxrm).  Point being that may be so expensive 
that other code generation approaches might produce much better performance.

ACK for the change to add FRM to the global registers.  Someone had a 
concern about strict aliasing for the test.  So let's get that sorted 
out before committing.

jeff

Reply via email to