On 7/19/24 2:55 AM, demin.han wrote:
Currently, some binops of vector vs double scalar under RV32 can't
translated to vf but vfmv+vxx.vv.
The cause is that vec_duplicate is also expanded to broadcast for double mode
under RV32. last-combine can't process expanded broadcast.
gcc/ChangeLog:
* config/riscv/vector.md: Add !FLOAT_MODE_P constrain
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto
It looks like vadd-rv32gcv-nofm still isn't quite right according to the
pre-commit testing:
>
https://github.com/ewlu/gcc-precommit-ci/issues/1931#issuecomment-2238752679
OK once that's fixed. No need to wait for another review cycle.
And a note. We need to be careful as some uarchs may pay a penalty when
the vector unit needs to get an operand from the GP or FP register
files. So there could well be cases where using .vf or .vx forms is
slower. Consider these two scenarios.
First, we broadcast from the GP/FP across a vector regsiter outside a
loop, the use a .vv form in the loop.
Second we use a .vf or .vx form in the loop instead without any broadcast.
In the former case we only pay the penalty for crossing register files
once. In the second case we'd pay it for every iteration of the loop.
Given this is going to be uarch sensitive, I don't mind biasing towards
the .vx/.vf forms right now, but we may need to add some costing models
to this in the future as we can test on a wider variety of uarchs.
jeff