On 7/22/25 3:03 AM, Robin Dapp wrote:
Hi,

During the last weeks it became clear that our current broadcast
handling needs an overhaul in order to improve maintainability.
PR121073 showed that my intermediate fix wasn't enough and caused
regressions.

This patch now goes a first step towards untangling broadcast
(vmv.v.x), "set first" (vmv.s.x), and zero-strided load (vlse).
Also can_be_broadcast_p is rewritten and strided_broadcast_p is
introduced to make the distinction clear directly in the predicates.

Due to the pervasiveness of the patterns I needed to touch a lot
of places and tried to clear up some things while at it.  The patch
therefore also introduces new helpers expand_broadcast for vmv.v.x
that dispatches to regular as well as strided broadcast and
expand_set_first that does the same thing for vmv.s.x.

The non-strided fallbacks are now implemented as splitters of the
strided variants.  This makes it easier to see where and when things
happen.

The test cases I touched appeared wrong to me so this patch sets a new
baseline for some of the scalar_move tests.

There is still work to be done but IMHO that can be deferred: It would
be clearer if the three broadcast-like variants differed not just in
name but also in RTL pattern so matching is not as confusing.  Right now
vmv.v.x and vmv.s.x only differ in the mask and are interchangeable by
just changing it from "all ones" to a "single one".


As last time, I regtested on rv64 and rv32 with strided_broadcast turned
on and off.  Note there are regressions cond_fma_fnma-[78].c.  Those are
due to the patch exposing more fwprop/late-combine opportunities.  For
fma/fnma we don't yet have proper costing for vv/vx in place but I'll
expect that to be addressed soon and figured we can live with those for
the time being.

Regards
Robin

     PR target/121073

gcc/ChangeLog:

     * config/riscv/autovec-opt.md: Use new helpers.
     * config/riscv/autovec.md: Ditto.
     * config/riscv/predicates.md (strided_broadcast_mask_operand):
     New predicate.
     (strided_broadcast_operand): Ditto.
     (any_broadcast_operand): Ditto.
     * config/riscv/riscv-protos.h (expand_broadcast): Declare.
     (expand_set_first): Ditto.
     (expand_set_first_tu): Ditto.
     (strided_broadcast_p): Ditto.
     * config/riscv/riscv-string.cc (expand_vec_setmem): Use new
     helpers.
     * config/riscv/riscv-v.cc (expand_broadcast): New functionk.
     (expand_set_first): Ditto.
     (expand_set_first_tu): Ditto.
     (expand_const_vec_duplicate): Use new helpers.
     (expand_const_vector_duplicate_repeating): Ditto.
     (expand_const_vector_duplicate_default): Ditto.
     (sew64_scalar_helper): Ditto.
     (expand_vector_init_merge_repeating_sequence): Ditto.
     (expand_reduction): Ditto.
     (strided_broadcast_p): New function.
     (whole_reg_to_reg_move_p): Use new helpers.
     * config/riscv/riscv-vector-builtins-bases.cc: Use either
     broadcast or strided broadcast.
    * config/riscv/riscv-vector-builtins.cc (function_expander::use_ternop_insn):
     Ditto.
     (function_expander::use_widen_ternop_insn): Ditto.
     (function_expander::use_scalar_broadcast_insn): Ditto.
     * config/riscv/riscv-vector-builtins.h: Declare scalar
     broadcast.
     * config/riscv/vector.md (*pred_broadcast<mode>): Split into
     regular and strided broadcast.
     (*pred_broadcast<mode>_zvfh): Split.
     (pred_broadcast<mode>_zvfh): Ditto.
     (*pred_broadcast<mode>_zvfhmin): Ditto.
     (@pred_strided_broadcast<mode>): Ditto.
     (*pred_strided_broadcast<mode>): Ditto.
     (*pred_strided_broadcast<mode>_zvfhmin): Ditto.

gcc/testsuite/ChangeLog:

     * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-6.c: Adjust test
     expectation.
     * gcc.target/riscv/rvv/base/scalar_move-5.c: Ditto.
     * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
     * gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
     * gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
     * gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto.
     * gcc.target/riscv/rvv/pr121073.c: New test.
---

When I read the cover, I expected this to be far uglier. It actually looks like a lot of nice cleanups/refactoring.

WRT splitting up he vmv.v.x and vmv.s.x as they only differ in their mask predicate -- your call on if/when to do that. My only worry would be if we perhaps have cases one day where we start with a vmv.v.x form, but then due to RTL optimization that a vmv.s.x would work too (and would likely perform better on larger vector designs). But I doubt that happens in practice, so it's probably not worth worrying about.

Once committed I'll reset the baselines on my tester. I wouldn't be terribly surprised if we see some other testing fallout due to different multilib settings and such. But we can deal with that.

Note that your pr121073 test fails :-) So you'll need to adjust something there. OK with pr121073.c fixed...

Jeff

Reply via email to