On 4/12/25 12:41 AM, Alexandre Oliva wrote:

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

Regression-tested with gcc-14 x86_64-linux-gnu-hosted crosses to
riscv64-elf and riscv32-elf.  Also smoke-tested on trunk, still passing
the pr118182-2.c test with a cross to riscv64-elf.  Ok to install?


for  gcc/ChangeLog

        PR target/118182
        * config/riscv/vector.md (@pred_broadcast<mode>): Expand to
        _zero and _imm variants without vec_duplicate.
I'd said this should tend to wait for gcc-16 since it doesn't fix a regression.

I will note that what you've found is relatively common in the RISC-V port; we've generally been tackling problems with combiner patterns rather than looking at whether or not we should be generating better code earlier (say at expand time). My intern and I are working through these issues with the basic logical ops now. This is a blocker to removing mvconst_internal.

So just keep it in mind as you're poking around -- what you're finding likely will show up elsewhere and I'm supportive of moving this stuff to expansion time.

jeff


Reply via email to