> On 01/23/2018 11:02 AM, Jan Hubicka wrote: > >> On 01/19/2018 12:57 PM, Martin Liška wrote: > >>> Yes, there's a huge difference in between CPU 2006 and 2017. Former has > >>> 63% w/ dominant edges, > >>> and later one only 11%. It's caused by these 2 benchmarks with a high > >>> coverage: > >>> > >> > >> Hi. > >> > >> I'm sending details about the 2 edges that influence the statistics > >> significantly: > >> > >>> 500.perlbench_r: regexec.c.065i.profile: > >>> negative return heuristics of edge 1368->1370: 2.0% exec 2477714850 > >>> hit 2429863555 (98.1%) > >> > >> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp) > >> -: 3485:{ > >> 2477714850: 3486: CHECKPOINT lastcp; > >> 2477714850: 3487: REGEXP *const rx = reginfo->prog; > >> 2477714850: 3488: regexp *const prog = ReANY(rx); > >> 2477714850: 3489: SSize_t result; > >> [snip] > >> 2477714850: 8046: assert(!result || locinput - reginfo->strbeg >= 0); > >> 2477714850: 8047: return result ? locinput - reginfo->strbeg : -1; > >> -: 8048:} > >> > >> As seen it return -1 if a regex is not found, which is in case of > >> perlbench very likely branch. > >> > >>> > >>> and 523.xalancbmk_r: > >>> build/build_peak_gcc7-m64.0000/NameDatatypeValidator.cpp.065i.profile: > >>> negative return heuristics of edge 3->4: 2.0% exec 1221735072 hit > >>> 1221522453 (100.0%) > >> > >> 1221735072: 74:int NameDatatypeValidator::compare(const XMLCh* const > >> lValue > >> -: 75: , const XMLCh* const > >> rValue > >> -: 76: , MemoryManager* > >> const) > >> -: 77:{ > >> 1221735072: 78: return ( XMLString::equals(lValue, rValue)? 0 : -1); > >> -: 79:} > >> -: 80: > >> > >> IP profile dump file: > >> > >> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator > >> * const this, const XMLCh * const lValue, const XMLCh * const rValue, > >> struct MemoryManager * const D.17157) > >> { > >> bool _1; > >> int iftmp.0_2; > >> > >> <bb 2> [count: 1221735072]: > >> # DEBUG BEGIN_STMT > >> _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D)); > >> if (_1 != 0) > >> goto <bb 4>; [0.02%] > >> else > >> goto <bb 3>; [99.98%] > >> > >> <bb 3> [count: 1221522453]: > >> > >> <bb 4> [count: 1221735072]: > >> # iftmp.0_2 = PHI <0(2), -1(3)> > >> return iftmp.0_2; > >> > >> } > >> > >> Likewise, XML values are more commonly non-equal. > >> Ok, so may I mark also negative return with PROB_EVEN to track it? > > > > Well, if we start disabling predictors just because we can find benchmark > > where they do not > > perform as expected, we will need to disable them all. It is nature of > > heuristics to make > > mistakes, just they should do something useful for common code. > > It is not goal to make heuristics that makes no mistake. > > Heuristics is useful either if it have high precision even if it cover few > > branches (say noreturn), or if it have large coverage and still some > > hitrate (say opcode, > > call or return based heuristics). > > > > To make things bit more esoteric, in practice this is also bit weighted by > > fact > > how much damage bad prediction does. For example, call heuristics which > > predict > > non-call path to be taken is likely going to do little damage. Even if call > > path is common it is heavy and hard to optimize by presence of call and > > thus we > > don't loose that much in common scenarios. > > > > In the second case: > > 00073 // > > ----------------------------------------------------------------------- > > 00074 // Compare methods > > 00075 // > > ----------------------------------------------------------------------- > > 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue > > 00077 , const XMLCh* const rValue > > 00078 , MemoryManager* const) > > 00079 { > > 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1); > > 00081 } > > > > I would say it is odd coding style. I do not see why it is not declared > > bool and not returning true/false. > > > > First case seems bit non-standard too as it looks like usual cache lookup > > just > > the cache frequently fails. > > > > I would be in favor of keeping the prediction with non-50% hitrate thus > > unless > > we run into some performance problems. > > If there are only two benchmarks out of say 20 we tried (in spec2000 and > > 2k17) > > it seems fine to me. > > > > There is issue with the way we collect our data. Using arithmetic mean > > across > > spec is optimizing for overall runtime of all spec benchmarks combined > > together. We more want to look for safer values that do well for average > > benchmarks independently how many branches they do. We could consider > > making > > the statistics script also collect geometric averages across different > > benchmarks. If we will see big difference between arithmetic mean and > > geomean > > we will know there is small subset of benchmarks which dominates and we > > could > > decide what to do with the probability. > > Hello. > > I fully agree that proper way to do statistics is to do a weight average > across > benchmarks how each predictor hits performance. Which is nice, but quite hard > to achieve. Let's talk personally how we can change it in next stage1. > > For now, in case of the 'negative return' predictor, I tend to leave it as it > is > and install the rest of the patch that makes PROB_EVEN probability.
OK, makes sense to me. Honza > > Works for you Honza? > Martin > > > > > Honza > > > >> > >> Thanks, > >> Martin > >> > >>> > >>> Ideas what to do with the predictor for GCC 8 release? > >>> Martin