On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson <tejohn...@google.com> wrote: > On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> >>> I looked at one that failed after 100 as well (20031204-1.c). In this >>> case, it was due to expansion which was creating multiple branches/bbs >>> from a logical OR and guessing incorrectly on how to assign the >>> counts: >>> >>> if (octets == 4 && (*cp == ':' || *cp == '\0')) { >>> >>> The (*cp == ':' || *cp == '\0') part looked like the following going >>> into RTL expansion: >>> >>> [20031204-1.c : 31:33] _29 = _28 == 58; >>> [20031204-1.c : 31:33] _30 = _28 == 0; >>> [20031204-1.c : 31:33] _31 = _29 | _30; >>> [20031204-1.c : 31:18] if (_31 != 0) >>> goto <bb 16>; >>> else >>> goto <bb 19>; >>> >>> where the result of the OR was always true, so bb 16 had a count of >>> 100 and bb 19 a count of 0. When it was expanded, the expanded version >>> of the above turned into 2 bbs with a branch in between. Both >>> comparisons were done in the first bb, but the first bb checked >>> whether the result of the *cp == '\0' compare was true, and if not >>> branched to the check for whether the *cp == ':' compare was true. It >>> gave the branch to the second check against ':' a count of 0, so that >>> bb got a count of 0 and was split out, and put the count of 100 on the >>> fall through assuming the compare with '\0' always evaluated to true. >>> In reality, this OR condition was always true because *cp was ':', not >>> '\0'. Therefore, the count of 0 on the second block with the check for >>> ':' was incorrect, we ended up trying to execute it, and failed. >> >> I see, we produce: >> ;; if (_26 != 0) >> >> (insn 94 93 95 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 107 [ D.2184 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) >> (eq:QI (reg:CCZ 17 flags) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (insn 96 95 97 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 122 [ D.2186 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (jump_insn 97 96 98 (set (pc) >> (if_then_else (ne (reg:CCZ 17 flags) >> (const_int 0 [0])) >> (label_ref 100) >> (pc))) a.c:31 -1 >> (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) >> (nil))) >> >> (insn 98 97 99 (set (reg:CCZ 17 flags) >> (compare:CCZ (reg:QI 108 [ D.2186 ]) >> (const_int 0 [0]))) a.c:31 -1 >> (nil)) >> >> (jump_insn 99 98 100 (set (pc) >> (if_then_else (eq (reg:CCZ 17 flags) >> (const_int 0 [0])) >> (label_ref 0) >> (pc))) a.c:31 -1 >> (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) >> (nil))) >> >> (code_label 100 99 0 14 "" [0 uses]) >> >> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)" >> >> First I think the logic of do_jump should really be moved to trees. It is >> not >> doing things that can not be adequately represented by gimple. >> >> I am not that certain we want to move it before profiling though. >>> >>> Presumably we had the correct profile data for both blocks, but the >>> accuracy was reduced when the OR was represented as a logical >>> computation with a single branch. We could change the expansion code >>> to do something different, e.g. treat as a 50-50 branch. But we would >>> still end up with integer truncation issues when there was a single >>> training run. But that could be dealt with conservatively in the >> >> Yep, but it is still better than what we have now - if the test above was >> in hot part of program (i.e. not executed once), we will end up optimizing >> the second conditional for size. >> >> So I think it is do_jump bug to not distribute probabilities across the two >> conditoinals introduced. >>> bbpart code as I suggested for the jump threading issue above. I.e. a >>> cold block with incoming non-cold edges conservatively not marked cold >>> for splitting. >> >> Yep, we can probably do that, but we ought to fix the individual cases >> above at least for resonable number of runs. > > I made this change and it removed a few of the failures. > > I looked at another case that still failed with 1 train run but passed > with 100. It turned out to be another truncation issue exposed by RTL > expansion, where we created some control flow for a memset builtin > which was in a block with an execution count of 1. Some of the blocks > got frequencies less than half the original block, so the count was > rounded down or truncated to 0. I noticed that in this case (as well > as the jump threading case I fixed by looking for non-zero incoming > edges in partitioning) that the bb frequency was non-zero. > > Why not just have probably_never_executed_bb_p return simply return > false bb->frequency is non-zero (right now it does the opposite - > returns true when bb->frequency is 0)? Making this change removed a > bunch of other failures. With this change as well, there are only 3 > cases that still fail with 1 train run that pass with 100. Need to > look at those. > >> >> Will you look into logic of do_jump or shall I try to dive in? > > I can take a look, but probably won't have a chance until late this > week. If you don't get to it before then I will see if I can figure > out why it is applying the branch probabilities this way.
Turned out not to be too tricky to fix the do_jump issue affecting 20031204-1.c. The patch below fixes the issue for this test case (along with the patch I posted a couple days ago that handles profile insanities conservatively in the case where there is 1 training run and we truncate counts): Index: dojump.c =================================================================== --- dojump.c (revision 202947) +++ dojump.c (working copy) @@ -325,15 +325,20 @@ break; case TRUTH_ORIF_EXPR: + /* Spread the probability evenly between the two conditions. So + the first condition has half the total probability of being true, + and therefore has half the probability of being false + (i.e. falls through to the second condition). If we reach the + second condition, it will be true with the original probability. */ if (if_true_label == NULL_RTX) { drop_through_label = gen_label_rtx (); - do_jump (op0, NULL_RTX, drop_through_label, prob); + do_jump (op0, NULL_RTX, drop_through_label, prob / 2); do_jump (op1, if_false_label, NULL_RTX, prob); } else { - do_jump (op0, NULL_RTX, if_true_label, prob); + do_jump (op0, NULL_RTX, if_true_label, prob / 2); do_jump (op1, if_false_label, if_true_label, prob); } break; I am regression testing this now and will post the patch for review in a separate thread. Honza, any comments on the patch I posted a couple days ago that treats the profile insanities conservatively? Thanks, Teresa > > Teresa > >> >> Honza > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413