Re: optimizing predictable branches (Was: ... on x86)

Jan Hubicka Wed, 27 Feb 2008 05:04:27 -0800

> This is also interesting for the ARC700 processor.
> 
> There is also an issue if the flag for the conditionalized instruction is
> set in the immediately preceding instruction, and the result of the
> conditionalized instruction is required in the immediately following
> instruction, and if using a conditional branch with a short offset,
> there is also the opportunity to combine a comparison or bit test
> with the branch.
> 
> MOreover, since the ARCompact architecture a lot more registers than x86,
> if you don't use a frame pointer, there are also realistic
> opportunities to use conditional function returns.
> 
> Already back when I was an SH maintainer, I was annoyed that there is
> only one BRANCH_COST.  We should really have different ones for
> predictable and unpredictable/mispredicted branches.
> 
> Also, it would make sense if the cost could be modified according to if
> the compiler thinks it will be able to schedule a delay slot instruction.
> 
> Ideally alignment could also be taken into account, but that would
> require to do register allocation first, so there appears to be no viable
> pass ordering withing the gcc infrastructure to make this work.
> 
> For an exact modeling, we should actually have three branch costs,
> distinguishing the cost from having no prediction to having a wrong
> prediction.
> However, in 'hot' code we can assume we have some prediction - either
> right or wrong, and 'cold' code would typically not matter, unles you
> have a humongous program with very poor locality.
> 
> Howevr, for these reasons I think that COLD_BRANCH_COST is a misnomer,
> and could also promt port writers to put the wrong value there,
> since it's the mispredicted branches we are interested in.
> MISPREDICTED_BRANCH_COST would be more descriptive.


In the patch, I was using BRANCH_COST for the usual branches, that is
assumed to be badly predictable and PREDICTABLE_BRANCH_COST for the few
branches we identify to be well predictable.

COLD_BRANCH_COST is not for mispredicted branches, but for branches in
regions of programs that are expected to be rarely executed (via profile
feedback or because they lead to noreturn call for instance), so it is
basically what BRANCH_COST would be if we was optimizing for size. 

On i386 larger BRANCH_COST tends to increase code size because of
register pressure and because the expanded cmov sequences when cmov
instruction is missing tends to be quite ridiculous of set flags or sbb.
So COLD_BRANCH_COST is probably best to be left at 1 or 0, while both
predictable and unpredictable costs are higher.

In general I would love to bring our cost model to always have HOT and
COLD variants (or optimize for speed/optimize for size) and drive
expansion by it.  Ideally -O2 and -Os should be different only by
changing default behaviour of maybe_hot_bb_p and probably_cold_bb_p
predictates.  We are losing quite good amount of code size benefits byt
not doing that with profile feedback on.

I got stuck in GCC 4.2 times with maybe_hot_insn_p patch I will try to
come up with something sane for 4.4.  With default optimization we are
poor on predicting coldness of block of program (noreturn or
__builtin_expect and hot/cold function attributes are the only reliable
predictor in this), but this will change with LTO quite easilly.

Perhaps to avoid confusion we can drive backends to have either one
BRANCH_COST defined we can have:
PREDICTED_BRANCH_COST (hot_p)
MISPREDICTED_BRANCH_COST (hot_p)
and make them to default to BRANCH_COST if they are not defined at all?
Or have
BRANCH_COST (hot_p, predictable_p)?

Note that old pgcc used to have TAKEN and NOT_TAKEN_BRNCH_COST as well
as MISPREDICTION_COST (names can be different, I don't remember).  This
is perhaps more accurate model but I don't see how we can take resonably
advantage of it, since we reorder branches later anyway.

OK I guess the patch has gained enough interest so I will bring it to
mainline ;)

Honza

Re: optimizing predictable branches (Was: ... on x86)

Reply via email to