> > As noted above, the change to avoid the RDIV by runs decreases the > effect of the min_run_ratio (now unlikely_count_fraction) in half. So > we really need to increase this parameter to 32 to compare to what the > old version of the code was doing. > > Indeed, using 32 does fix the same set of issues. However, when > setting the parameter to the old default of 4, there are 5 core dumps > using the patch to make the unlikely code unexecutable with a 100 > training runs (see below for some examples why). This goes down to 2 > failures if I set the parameter to 10, and 0 if I set it to 20. Using > 20 essentially means that the code has to be executed 5% of the runs, > which seems reasonable for something like splitting. What do you > think?
Yes, I think I am fine with pretty much any constant that is not insanely large. 20 or 32 seems fine. > > 1) No context sensitivity for routine that is inlined: > > In > /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr33870.c, > we call merge_pagelist 3 times from sort_pagelist. In the profile-use > compile merge_pagelist is inlined (I believe in all 3 locations). One > of the conditional blocks (line 23) has a profile count that is less > than runs/4. It turns out that from the first merge_pagelist callsite > we always execute this block, but the other two are completely cold. > But the profile info from all three calls is of course merged, and it > looks like it is executed infrequently overall. So the block is split > but then we execute it from the inline instance corresponding to the > first call. > > I'm not sure what we can do in general here. But it is another data > point suggesting to me that the blocks should be very cold to be > split. Yep, these issues from code duplicating optimizations are probably going to be unavoidable. > > 2) RTL expansion: > > Another was in the following code from > /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr43784.c: > > static struct s __attribute__((noinline)) rp(void) > { > return *p; > } > > where p is a global that is assigned as: > > static union u v; > static struct s *p = &v.d.b; > > RTL expansion synthesizes a bunch of tests and branches (alignment > handling?), and guesses at the probabilities of the resulting Yep, this should get better with my memcpy patch. I will try to update it and get it to mainline druing the weekend. > branches. Unfortunately, this is less than accurate and we end up > executing a block that is given a small likelihood, making it below > the threshold with the default unlikely fraction of 4. > > Here is the patch with the current threshold kept as 4. Let me know if > you think we can kick this up to 20, or if we should just increase the > threshold for splitting. > > Passes regression tests, ok for trunk? > > (BTW, getting close. I have a bunch of fixes to the loop unroller that > need some cleanup, and a small fix to ssa tail merging to update the > profiles, but I think that does it for the tests I was looking at. > Will send those soon for review.) > > Thanks, > Teresa > > 2013-10-10 Teresa Johnson <tejohn...@google.com> > > * predict.c (probably_never_executed): Compare frequency-based > count to number of training runs. > * params.def (UNLIKELY_BB_COUNT_FRACTION): New parameter. OK, thanks for working on this! > > Index: predict.c > =================================================================== > --- predict.c (revision 203390) > +++ predict.c (working copy) > @@ -237,17 +237,33 @@ probably_never_executed (struct function *fun, > gcc_checking_assert (fun); > if (profile_status_for_function (fun) == PROFILE_READ) > { > - if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) > + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); > + if (count * unlikely_count_fraction >= profile_info->runs) > return false; > if (!frequency) > return true; > if (!ENTRY_BLOCK_PTR->frequency) > return false; > - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < > REG_BR_PROB_BASE) > + if (ENTRY_BLOCK_PTR->count) > { > - return (RDIV (frequency * ENTRY_BLOCK_PTR->count, > - ENTRY_BLOCK_PTR->frequency) > - < REG_BR_PROB_BASE / 4); > + gcov_type computed_count; > + /* Check for possibility of overflow, in which case entry bb count > + is large enough to do the division first without losing much > + precision. */ > + if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE) > + { > + gcov_type scaled_count > + = frequency * ENTRY_BLOCK_PTR->count * > unlikely_count_fraction; > + computed_count = RDIV (scaled_count, > ENTRY_BLOCK_PTR->frequency); > + } > + else > + { > + computed_count = RDIV (ENTRY_BLOCK_PTR->count, > + ENTRY_BLOCK_PTR->frequency); > + computed_count *= frequency * unlikely_count_fraction; > + } > + if (computed_count >= profile_info->runs) > + return false; > } > return true; > } > Index: params.def > =================================================================== > --- params.def (revision 203390) > +++ params.def (working copy) > @@ -373,6 +373,11 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION, > "Select fraction of the maximal frequency of executions of > basic block in function given basic block needs to have to be > considered hot", > 1000, 0, 0) > > +DEFPARAM(UNLIKELY_BB_COUNT_FRACTION, > + "unlikely-bb-count-fraction", > + "The minimum fraction of profile runs a given basic block > execution count must be not to be considered unlikely", > + 4, 1, 10000) > + > DEFPARAM (PARAM_ALIGN_THRESHOLD, > "align-threshold", > "Select fraction of the maximal frequency of executions of > basic block in function given basic block get alignment", > > > > > > Thanks, > > Teresa > > > >> > >> I will check the second part of patch separately. > >> Thanks! > >> Honza > > > > > > > > -- > > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413