On Tue, 28 Nov 2023, Jeff Law wrote: > FWIW, I was looking at a regression with our internal tests after your > changes. It was quite nice to see how well twiddling -mbranch-cost > correlated to how many instructions we would allow in a conditional move > sequence.
I'm a bit concerned though that our interpretation of `-mbranch-cost=0' is different from the middle end's, such as in `emit_store_flag': /* If we reached here, we can't do this with a scc insn, however there are some comparisons that can be done in other ways. Don't do any of these cases if branches are very cheap. */ if (BRANCH_COST (optimize_insn_for_speed_p (), false) == 0) return 0; > The downside is it highlighted the gimple vs RTL use issue. I'm confident > that we would like to see a higher branch cost in the RTL phases for our > uarch, but I'm much less comfortable with how that's going to change the > decisions made in trees/gimple. We'll have to investigate that at some depth. Ack. > > I've looked at it already and it's the middle end that ends up with the > > zero-extension, specifically `convert_move' invoked from `emit_cstore' > > down the call to `noce_try_store_flag_mask', to widen the output from > > `cstoredi4', so I don't think we can do anything in the backend to prevent > > it from happening. And neither I think we can do anything useful about > > `cstoredi4' having a SImode output, as it's a pattern matched by name > > rather than RTX, so we can't provide variants having a SImode and a DImode > > output each both at a time, as that would cause a name clash. > We're actually tracking some of these extraneous extensions. Do you happen to > know if the zero-extended object happens to be (subreg:SI (reg:DI)) kind of > construct? That's the kind of thing we're chasing down right now from various > points. Vineet has already fixed one class of them. Jivan and I are looking > at others. Under GDB it's a plain move from (reg:SI 140) to (reg:DI 139), as in the FROM and TO arguments to `convert_move' respectively. This makes it call `convert_mode_scalar', which then chooses between `zext_optab' and `sext_optab' as appropriate, under: /* If the target has a converter from FROM_MODE to TO_MODE, use it. */ to produce: (set (reg:DI 139) (zero_extend:DI (reg:SI 140))) ending up with this complete sequence: (insn 27 0 28 (set (reg:SI 140) (eq:SI (reg/v:DI 137 [ c ]) (const_int 0 [0]))) -1 (nil)) (insn 28 27 29 (set (reg:DI 139) (zero_extend:DI (reg:SI 140))) -1 (nil)) (insn 29 28 30 (set (reg:DI 141) (neg:DI (reg:DI 139))) -1 (nil)) (insn 30 29 0 (set (reg/v:DI 134 [ <retval> ]) (and:DI (reg/v:DI 135 [ a ]) (reg:DI 141))) -1 (nil)) passed to `targetm.noce_conversion_profitable_p' right away. Maybe you can teach `emit_cstore' or `convert_move' to use a subreg when it is known for the particular target that the value produced by the conditional-set machine instruction emitted by `cstoreMODE4' is valid unchanged in both modes. You can fiddle with it by trying: $ gcc -march=rv64gc -mbranch-cost=3 -O2 -S gcc/testsuite/gcc.target/riscv/pr105314.c Set a breakpoint at `noce_try_store_flag_mask' and then single-step to see how things proceed. Maciej