On 2/24/2025 4:34 PM, Jeff Law wrote:


On 2/24/25 5:07 PM, Edwin Lu wrote:
See [1] thread for original patch which spawned this one.

We are currently seeing the following code where we perform a vsetvl
before a branching instruction against the avl.

         vsetvli a5,a1,e32,m1,tu,ma
         vle32.v v2,0(a0)
         sub     a1,a1,a5 <-- a1 potentially set to 0
         sh2add  a0,a5,a0
         vfmacc.vv       v1,v2,v2
         vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
         beq     a1,zero,.L12 <-- check if avl is 0

Since we are branching off of the avl, we don't need to update vl until
after the branch is taken. Search the ready queue for vsetvls scheduled
before branching instructions that branch off of the same regno and
promote the branches to execute first. This can improve performancy by
potentially avoiding setting VL=0 which may be expensive on some uarches.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675622.html

    PR/117974

gcc/ChangeLog:

    * config/riscv/riscv.cc (vsetvl_avl_regno): New helper function.
    (insn_increases_zeroness_p): Ditto.
    (riscv_promote_ready): Ditto.
    (riscv_sched_reorder): Implement hook.
    (TARGET_SCHED_REORDER): Define Hook.
    * config/riscv/riscv.opt: New flag.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/vsetvl/pr117974.c: New test.
So I preferred the earlier approach of disabling speculation of the vsetvls, though I'm guessing you're looking at this approach because that was insufficient?
I don't think that it was because it was insufficient, but that it might be too constraining. I'm not exactly certain if there is the possibility where speculatively issuing some vsetvls in some situations could improve perf. I think with this patch, it targets the specific issue we're facing directly which is updating vl=0 when it's unnecessary. I don't know which patch would be better so I'm putting this out there as a potential alternative.
Regardless, like the disabling of speculation I tend to think this should be controlled by an entry in the tuning structure rather than flags. Flags are fine for internal testing, but I don't see a compelling need for a flag to control this in general.

Jeff

That makes sense to me. I completely forgot that we had talked about putting it into the tuning structure when writing this. I'll update the tunes respectively in a later version of the patches depending on which one moves forward.

Edwin


Reply via email to