On Thu, 2024-09-05 at 11:59 -0700, Palmer Dabbelt wrote:
> On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> > We have cheap logical ops, so let's just move this back to the default
> > to take advantage of the standard branch/op hueristics.
> > 
> > gcc/ChangeLog:
> > 
> >     PR target/116615
> >     * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> > ---
> > There's a bunch more discussion in the bug, but it's starting to smell
> > like this was just a holdover from MIPS (where maybe it also shouldn't
> > be set).  I haven't tested this, but I figured I'd send the patch to get
> > a little more visibility.
> > 
> > I guess we should also kick off something like a SPEC run to make sure
> > there's no regressions?
> 
> Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
> ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where 
> short-circuiting the FP comparisons helps on LoongArch.
> 
> Not sure if I'm also missing something here, but it kind of feels like
> that should be handled by a more generic optimization decision that just 
> globally "should we short circuit logical ops" -- assuming it really is 
> the FP comparisons that are causing the cost, as opposed to the actual
> logical ops themselves.

IIUC there are some contributing factors here:

1. On LoongArch FP comparison is slow (costing 5 cycles).
2. On LoongArch the FP comparison result is stored into FCC registers,
and to do logical operations on two comparison results they need to be
moved into GPR first.  The move costs one or two cycles (depending on
the uarch).

and maybe

3. The FP comparison result in the SPEC tests are somewhat predictable.
IIRC when I tested dddafe94823 I made a test program where the FP
comparison results are "randomized" (so the branch predictor is
defeated), then the branch-less code generated with -Ofast --param
logical-op-non-short-circuit=1 was actually faster than the code
generated with -Ofast --param logical-op-non-short-circuit=0.

AFAIK 2 isn't an issue for RISC-V (where FP comparison result is just in
GPR) but 1 and 3 may still need to be considered.

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to