Vineet Gupta <vine...@rivosinc.com> writes: > +CC gcc-patches > > On 5/30/25 14:04, Vineet Gupta wrote: >> Hi Jeff, Richard >> >> As part of RISC-V FRM mode switching improvements, I'm running into a >> behavior >> in late_combine2 where it is eliminating FRM save/restores when it is >> desired to >> keep them. >> >> I'm pasting snippet of RTL dumps, could you please see if u is anything >> jumping >> out from this limited info. >> In RISC-V backend, FRM is specified as a global_reg and inline asm is the >> only >> way for users to achieve the global fesetround() like semantics. >> >> src >> >> /* -march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize */ >> #pragma riscv intrinsic "vector" >> typedef long unsigned int size_t; >> >> static void >> set_frm (int frm) >> { >> __asm__ volatile ( "fsrm %0" : :"r"(frm) : "frm"); >> } >> >> vfloat32m1_t __attribute__ ((noinline)) >> test_float_point_frm_run_1 (vfloat32m1_t op1, vfloat32m1_t op2, size_t >> vl) >> { >> vfloat32m1_t result; >> /* global mode set */ >> set_frm (0); >> /* intrinsic for set mode 1 should be local and restored back to >> global 0 >> upon return */ >> result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl); >> return result; >> } >> >> >> RTL dump >> >> mode-sw >> >> (insn 9 8 18 2 (parallel [ >> (asm_operands/v ("fsrm %0") ("") 0 [ >> (reg:SI 139) >> ] >> [ >> (asm_input:SI ("r") frm-run-1.c:33) >> ] >> [] frm-run-1.c:33) >> (clobber (reg:V4096QI 69 frm)) >> ]) "frm-run-1.c":33:3 -1 >> (expr_list:REG_DEAD (reg:SI 139) >> (nil))) >> >> (insn 27 10 28 2 (set (reg:SI 144) >> (reg:SI 69 frm)) "frm-run-1.c":43:1 -1 >> (nil)) >> (insn 28 27 14 2 (set (reg:SI 69 frm) >> (const_int 1 [0x1])) "frm-run-1.c":43:1 -1 >> (nil))
It looks like insn 14 has been snipped. What was it? Did it use FRM? If not, then... >> (insn 29 14 24 2 (set (reg:SI 69 frm) >> (reg:SI 144)) -1 >> (nil)) ...insns 27, 28, and 29 as given above collectively have no effect, assuming that reg 144 dies in insn 29. The sequence can be removed without changing the RTL semantics. The dump you give here: >> >> late-combine2 >> >> trying to combine definition of r15 in: >> 27: a5:SI=frm:SI >> into: >> 29: frm:SI=a5:SI >> instruction becomes a no-op: >> (set (reg:SI 69 frm) >> (reg:SI 69 frm)) >> original cost = 4 + 4 (weighted: 8.000000), replacement cost = nop; >> keeping >> replacement >> rescanning insn with uid = 29. >> updating insn 29 in-place >> verify found no changes in insn with uid = 29. >> deleting insn 27 >> deleting insn with uid = 27 >> >> >> If I comment out the inline asm - it can no longer combine, elimination >> doesn't >> happen with expected outcome. >> >> trying to combine definition of r15 in: >> 25: a5:SI=frm:SI >> into: >> 27: frm:SI=a5:SI >> -- cannot satisfy all definitions and uses in insn 27 ...is from late-combine2, so after RA has completed, whereas the earlier dump is from mode switching, so it's hard to tell what late-combine2 is operating on. Could you give the RTL as late-combine2 sees it? (That would normally be the result of pass_postreload_cse.) Thanks, Richard