On 7/19/24 2:55 AM, demin.han wrote:
Currently, some binops of vector vs double scalar under RV32 can't
translated to vf but vfmv+vxx.vv.

The cause is that vec_duplicate is also expanded to broadcast for double mode
under RV32. last-combine can't process expanded broadcast.

gcc/ChangeLog:

        * config/riscv/vector.md: Add !FLOAT_MODE_P constrain

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
        * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
        * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
        * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
        * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto
It looks like vadd-rv32gcv-nofm still isn't quite right according to the pre-commit testing:

> https://github.com/ewlu/gcc-precommit-ci/issues/1931#issuecomment-2238752679


OK once that's fixed.  No need to wait for another review cycle.

And a note. We need to be careful as some uarchs may pay a penalty when the vector unit needs to get an operand from the GP or FP register files. So there could well be cases where using .vf or .vx forms is slower. Consider these two scenarios.

First, we broadcast from the GP/FP across a vector regsiter outside a loop, the use a .vv form in the loop.

Second we use a .vf or .vx form in the loop instead without any broadcast.

In the former case we only pay the penalty for crossing register files once. In the second case we'd pay it for every iteration of the loop.

Given this is going to be uarch sensitive, I don't mind biasing towards the .vx/.vf forms right now, but we may need to add some costing models to this in the future as we can test on a wider variety of uarchs.

jeff

Reply via email to