On 11/28/18 1:51 PM, Segher Boessenkool wrote: > Hi! > > On Tue, Nov 27, 2018 at 05:07:11PM +0100, Ilya Leoshkevich wrote: >> perf diff -c wdiff:1,1 shows, that there is just one function >> (htab_traverse) that is significantly slower now: >> >> 2.98% 11768891764 exe [.] htab_traverse >> 1.91% 563949986 exe [.] >> compute_dominance_frontiers_1 >> >> The additional cycles consumed by this function matches the overall >> number of additionaly consumed cycles, and the contribution of the >> runner up (compute_dominance_frontiers_1) is 20 times smaller, so I >> think it's really just this one function. >> >> However, the generated assembly is completely identical in both cases! > > Ugh. We have seen this before :-( > > Thanks for investigating I don't consider the Power degradation as really > caused by your patch, then. > >> I saw similar situations in the past, so I tried adding a nop to >> htab_traverse: >> >> --- hashtab.c >> +++ hashtab.c >> @@ -529,6 +529,8 @@ htab_traverse (htab, callback, info) >> htab_trav callback; >> PTR info; >> { >> + __asm__ volatile("nop\n"); >> + >> PTR *slot = htab->entries; >> PTR *limit = slot + htab->size; >> >> and made a 5x re-run. The new measurements are 227.01s and 227.44s >> (+0.19%). With two nops I get 227.25s and 227.29s (+0.02%), which also >> looks like noise. >> >> Can this be explained by some microarchitectural quirk after all? > > Two frequent branch targets that get thrown into the same bin for prediction. > Results change based on random compiler changes, ASLR settings, phase of the > moon, how many people in your neighbourhood have had porridge for breakfast > this morning, etc. FWIW, I've found the hashtable code particularly vulnerable to this kind of performance jitter. I've long suspected it's more related to the data locations as I can see the jitter with the same binary running under valgrind/cachegrind control. ASLR being the most likely culprit in my mind.
However, in this case it seems different -- adding a NOP is changing the instruction stream. Could be collisions in the branch predictors or something similar. Ilya, can you repost the final patch? Jeff > > > Segher >