Hi Honza, I am finally getting back to working on this after a few weeks of working on some other priorities.
On Sat, Aug 31, 2013 at 2:46 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > I run my script on execute testsuite and looked into few testcases. The > problem I found > was roundoff errors - i.e. when expanding switch we set 50% chage that out of > bound > value is above or bellow. Both gets rounded to 0, because switch is executed > once > and the value is bellow. > > Partly this can be fixed by making probably_never_executed to consider > frequencies when > counts are too coarse: > > Index: predict.c > =================================================================== > --- predict.c (revision 202133) > +++ predict.c (working copy) > @@ -232,8 +232,22 @@ bool > probably_never_executed_bb_p (struct function *fun, const_basic_block bb) > { > gcc_checking_assert (fun); > - if (profile_info && flag_branch_probabilities) > - return ((bb->count + profile_info->runs / 2) / profile_info->runs) == 0; > + if (profile_status_for_function (fun) == PROFILE_READ) > + { > + if ((bb->count * 4 + profile_info->runs / 2) / profile_info->runs > 0) > + return false; > + if (!bb->frequency) > + return true; > + if (!ENTRY_BLOCK_PTR->frequency) > + return false; > + if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < > REG_BR_PROB_BASE) > + { > + return (RDIV (bb->frequency * ENTRY_BLOCK_PTR->count, > + ENTRY_BLOCK_PTR->frequency) > + < REG_BR_PROB_BASE / 4); > + } > + return true; > + } > if ((!profile_info || !flag_branch_probabilities) > && (cgraph_get_node (fun->decl)->frequency > == NODE_FREQUENCY_UNLIKELY_EXECUTED)) Did you mean to commit the above change? I see that it went in as part of r202258 but doesn't show up in the ChangeLog entry for that revision. > > In other cases it was mostly loop unrolling in combination with jump > threading. So > I modified my script to separately report when failure happens for test > trained > once and test trained hundred times. Thanks for the linker script. I reproduced your results. I looked at a couple cases. The first was one that failed after 1 training run only (20000910-2.c). It was due to jump threading, which you noted was a problem. For this one I think we can handle it in the partitioning, since there is an FDO insanity that we could probably treat more conservatively when splitting. I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 && (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto <bb 16>; else goto <bb 19>; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. > > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20000422-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20000910-2.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20020413-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20030903-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060420-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060905-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-2.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120808-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920726-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/990628-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/991216-2.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/991216-2.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/cmpdi-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/cmpdi-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/float-floor.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/float-floor.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr33870-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr33870-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr33870.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr33870.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr36093.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr37573.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr43784.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/pr43784.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/switch-1.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/switch-1.c > FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/va-arg-22.c > FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/va-arg-22.c > > FAIL1 is failure after one run, FIAL is failure after 100 train runs. > We should take look at FAILs and see if there are bugs to fix. For FAIL1 > I think it is kind of design problem: while implementing counts&frequencies > the idea was that small counts do not matter, so integer arithmetic is all > right. > > I wonder if with current C++ wonderland we can't simply switch count > to a better representation. Either sreal or fixedpoint with capping > (the integer overflow issues are tiring, too). It also seems like we should be able to detect the profile insanities caused by integer truncation and handle them conservatively. That being said, I see some sreal uses already in the profile.c code, so presumably we could use this for the counts as well if it turns out to be necessary? BTW, Rong also implemented his runtime patch to do the COMDAT profile merging. However, that ended up having some issues, that were solvable but would have caused us to lose all context sensitivity from COMDATS inlined during the profile-gen build. I am going to go back to solving this in the profile-use phase as we discussed in the separate thread on the COMDAT inlining patch I had been working on. Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413