On 4/12/25 12:41 AM, Alexandre Oliva wrote:
pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.
Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.
It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.
It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.
Regression-tested with gcc-14 x86_64-linux-gnu-hosted crosses to
riscv64-elf and riscv32-elf. Also smoke-tested on trunk, still passing
the pr118182-2.c test with a cross to riscv64-elf. Ok to install?
for gcc/ChangeLog
PR target/118182
* config/riscv/vector.md (@pred_broadcast<mode>): Expand to
_zero and _imm variants without vec_duplicate.
I'd said this should tend to wait for gcc-16 since it doesn't fix a
regression.
I will note that what you've found is relatively common in the RISC-V
port; we've generally been tackling problems with combiner patterns
rather than looking at whether or not we should be generating better
code earlier (say at expand time). My intern and I are working through
these issues with the basic logical ops now. This is a blocker to
removing mvconst_internal.
So just keep it in mind as you're poking around -- what you're finding
likely will show up elsewhere and I'm supportive of moving this stuff to
expansion time.
jeff