On Tue, 28 Nov 2023, Jeff Law wrote:

> FWIW, I was looking at a regression with our internal tests after your
> changes.   It was quite nice to see how well twiddling -mbranch-cost
> correlated to how many instructions we would allow in a conditional move
> sequence.

 I'm a bit concerned though that our interpretation of `-mbranch-cost=0' 
is different from the middle end's, such as in `emit_store_flag':

  /* If we reached here, we can't do this with a scc insn, however there
     are some comparisons that can be done in other ways.  Don't do any
     of these cases if branches are very cheap.  */
  if (BRANCH_COST (optimize_insn_for_speed_p (), false) == 0)
    return 0;

> The downside is it highlighted the gimple vs RTL use issue.  I'm confident
> that we would like to see a higher branch cost in the RTL phases for our
> uarch, but I'm much less comfortable with how that's going to change the
> decisions made in trees/gimple.  We'll have to investigate that at some depth.

 Ack.

> >  I've looked at it already and it's the middle end that ends up with the
> > zero-extension, specifically `convert_move' invoked from `emit_cstore'
> > down the call to `noce_try_store_flag_mask', to widen the output from
> > `cstoredi4', so I don't think we can do anything in the backend to prevent
> > it from happening.  And neither I think we can do anything useful about
> > `cstoredi4' having a SImode output, as it's a pattern matched by name
> > rather than RTX, so we can't provide variants having a SImode and a DImode
> > output each both at a time, as that would cause a name clash.
> We're actually tracking some of these extraneous extensions.  Do you happen to
> know if the zero-extended object happens to be (subreg:SI (reg:DI)) kind of
> construct?  That's the kind of thing we're chasing down right now from various
> points.  Vineet has already fixed one class of them.  Jivan and I are looking
> at others.

 Under GDB it's a plain move from (reg:SI 140) to (reg:DI 139), as in the 
FROM and TO arguments to `convert_move' respectively.  This makes it call 
`convert_mode_scalar', which then chooses between `zext_optab' and 
`sext_optab' as appropriate, under:

  /* If the target has a converter from FROM_MODE to TO_MODE, use it.  */

to produce:

(set (reg:DI 139)
    (zero_extend:DI (reg:SI 140)))

ending up with this complete sequence:

(insn 27 0 28 (set (reg:SI 140)
        (eq:SI (reg/v:DI 137 [ c ])
            (const_int 0 [0]))) -1
     (nil))
(insn 28 27 29 (set (reg:DI 139)
        (zero_extend:DI (reg:SI 140))) -1
     (nil))
(insn 29 28 30 (set (reg:DI 141)
        (neg:DI (reg:DI 139))) -1
     (nil))
(insn 30 29 0 (set (reg/v:DI 134 [ <retval> ])
        (and:DI (reg/v:DI 135 [ a ])
            (reg:DI 141))) -1
     (nil))

passed to `targetm.noce_conversion_profitable_p' right away.  Maybe you 
can teach `emit_cstore' or `convert_move' to use a subreg when it is known
for the particular target that the value produced by the conditional-set 
machine instruction emitted by `cstoreMODE4' is valid unchanged in both 
modes.

 You can fiddle with it by trying:

$ gcc -march=rv64gc -mbranch-cost=3 -O2 -S 
gcc/testsuite/gcc.target/riscv/pr105314.c

Set a breakpoint at `noce_try_store_flag_mask' and then single-step to see 
how things proceed.

  Maciej

Reply via email to