> > I looked at one that failed after 100 as well (20031204-1.c). In this > case, it was due to expansion which was creating multiple branches/bbs > from a logical OR and guessing incorrectly on how to assign the > counts: > > if (octets == 4 && (*cp == ':' || *cp == '\0')) { > > The (*cp == ':' || *cp == '\0') part looked like the following going > into RTL expansion: > > [20031204-1.c : 31:33] _29 = _28 == 58; > [20031204-1.c : 31:33] _30 = _28 == 0; > [20031204-1.c : 31:33] _31 = _29 | _30; > [20031204-1.c : 31:18] if (_31 != 0) > goto <bb 16>; > else > goto <bb 19>; > > where the result of the OR was always true, so bb 16 had a count of > 100 and bb 19 a count of 0. When it was expanded, the expanded version > of the above turned into 2 bbs with a branch in between. Both > comparisons were done in the first bb, but the first bb checked > whether the result of the *cp == '\0' compare was true, and if not > branched to the check for whether the *cp == ':' compare was true. It > gave the branch to the second check against ':' a count of 0, so that > bb got a count of 0 and was split out, and put the count of 100 on the > fall through assuming the compare with '\0' always evaluated to true. > In reality, this OR condition was always true because *cp was ':', not > '\0'. Therefore, the count of 0 on the second block with the check for > ':' was incorrect, we ended up trying to execute it, and failed.
I see, we produce: ;; if (_26 != 0) (insn 94 93 95 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 107 [ D.2184 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) (eq:QI (reg:CCZ 17 flags) (const_int 0 [0]))) a.c:31 -1 (nil)) (insn 96 95 97 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 122 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 97 96 98 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 100) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) (nil))) (insn 98 97 99 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 108 [ D.2186 ]) (const_int 0 [0]))) a.c:31 -1 (nil)) (jump_insn 99 98 100 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 0) (pc))) a.c:31 -1 (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) (nil))) (code_label 100 99 0 14 "" [0 uses]) That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)" First I think the logic of do_jump should really be moved to trees. It is not doing things that can not be adequately represented by gimple. I am not that certain we want to move it before profiling though. > > Presumably we had the correct profile data for both blocks, but the > accuracy was reduced when the OR was represented as a logical > computation with a single branch. We could change the expansion code > to do something different, e.g. treat as a 50-50 branch. But we would > still end up with integer truncation issues when there was a single > training run. But that could be dealt with conservatively in the Yep, but it is still better than what we have now - if the test above was in hot part of program (i.e. not executed once), we will end up optimizing the second conditional for size. So I think it is do_jump bug to not distribute probabilities across the two conditoinals introduced. > bbpart code as I suggested for the jump threading issue above. I.e. a > cold block with incoming non-cold edges conservatively not marked cold > for splitting. Yep, we can probably do that, but we ought to fix the individual cases above at least for resonable number of runs. Will you look into logic of do_jump or shall I try to dive in? Honza