> This is also interesting for the ARC700 processor. > > There is also an issue if the flag for the conditionalized instruction is > set in the immediately preceding instruction, and the result of the > conditionalized instruction is required in the immediately following > instruction, and if using a conditional branch with a short offset, > there is also the opportunity to combine a comparison or bit test > with the branch. > > MOreover, since the ARCompact architecture a lot more registers than x86, > if you don't use a frame pointer, there are also realistic > opportunities to use conditional function returns. > > Already back when I was an SH maintainer, I was annoyed that there is > only one BRANCH_COST. We should really have different ones for > predictable and unpredictable/mispredicted branches. > > Also, it would make sense if the cost could be modified according to if > the compiler thinks it will be able to schedule a delay slot instruction. > > Ideally alignment could also be taken into account, but that would > require to do register allocation first, so there appears to be no viable > pass ordering withing the gcc infrastructure to make this work. > > For an exact modeling, we should actually have three branch costs, > distinguishing the cost from having no prediction to having a wrong > prediction. > However, in 'hot' code we can assume we have some prediction - either > right or wrong, and 'cold' code would typically not matter, unles you > have a humongous program with very poor locality. > > Howevr, for these reasons I think that COLD_BRANCH_COST is a misnomer, > and could also promt port writers to put the wrong value there, > since it's the mispredicted branches we are interested in. > MISPREDICTED_BRANCH_COST would be more descriptive.
In the patch, I was using BRANCH_COST for the usual branches, that is assumed to be badly predictable and PREDICTABLE_BRANCH_COST for the few branches we identify to be well predictable. COLD_BRANCH_COST is not for mispredicted branches, but for branches in regions of programs that are expected to be rarely executed (via profile feedback or because they lead to noreturn call for instance), so it is basically what BRANCH_COST would be if we was optimizing for size. On i386 larger BRANCH_COST tends to increase code size because of register pressure and because the expanded cmov sequences when cmov instruction is missing tends to be quite ridiculous of set flags or sbb. So COLD_BRANCH_COST is probably best to be left at 1 or 0, while both predictable and unpredictable costs are higher. In general I would love to bring our cost model to always have HOT and COLD variants (or optimize for speed/optimize for size) and drive expansion by it. Ideally -O2 and -Os should be different only by changing default behaviour of maybe_hot_bb_p and probably_cold_bb_p predictates. We are losing quite good amount of code size benefits byt not doing that with profile feedback on. I got stuck in GCC 4.2 times with maybe_hot_insn_p patch I will try to come up with something sane for 4.4. With default optimization we are poor on predicting coldness of block of program (noreturn or __builtin_expect and hot/cold function attributes are the only reliable predictor in this), but this will change with LTO quite easilly. Perhaps to avoid confusion we can drive backends to have either one BRANCH_COST defined we can have: PREDICTED_BRANCH_COST (hot_p) MISPREDICTED_BRANCH_COST (hot_p) and make them to default to BRANCH_COST if they are not defined at all? Or have BRANCH_COST (hot_p, predictable_p)? Note that old pgcc used to have TAKEN and NOT_TAKEN_BRNCH_COST as well as MISPREDICTION_COST (names can be different, I don't remember). This is perhaps more accurate model but I don't see how we can take resonably advantage of it, since we reorder branches later anyway. OK I guess the patch has gained enough interest so I will bring it to mainline ;) Honza