Hi! On Tue, Nov 27, 2018 at 05:07:11PM +0100, Ilya Leoshkevich wrote: > perf diff -c wdiff:1,1 shows, that there is just one function > (htab_traverse) that is significantly slower now: > > 2.98% 11768891764 exe [.] htab_traverse > 1.91% 563949986 exe [.] > compute_dominance_frontiers_1 > > The additional cycles consumed by this function matches the overall > number of additionaly consumed cycles, and the contribution of the > runner up (compute_dominance_frontiers_1) is 20 times smaller, so I > think it's really just this one function. > > However, the generated assembly is completely identical in both cases!
Ugh. We have seen this before :-( Thanks for investigating I don't consider the Power degradation as really caused by your patch, then. > I saw similar situations in the past, so I tried adding a nop to > htab_traverse: > > --- hashtab.c > +++ hashtab.c > @@ -529,6 +529,8 @@ htab_traverse (htab, callback, info) > htab_trav callback; > PTR info; > { > + __asm__ volatile("nop\n"); > + > PTR *slot = htab->entries; > PTR *limit = slot + htab->size; > > and made a 5x re-run. The new measurements are 227.01s and 227.44s > (+0.19%). With two nops I get 227.25s and 227.29s (+0.02%), which also > looks like noise. > > Can this be explained by some microarchitectural quirk after all? Two frequent branch targets that get thrown into the same bin for prediction. Results change based on random compiler changes, ASLR settings, phase of the moon, how many people in your neighbourhood have had porridge for breakfast this morning, etc. Segher