On 11/28/18 1:51 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Nov 27, 2018 at 05:07:11PM +0100, Ilya Leoshkevich wrote:
>> perf diff -c wdiff:1,1 shows, that there is just one function
>> (htab_traverse) that is significantly slower now:
>>
>>      2.98%     11768891764  exe                [.] htab_traverse
>>      1.91%       563949986  exe                [.] 
>> compute_dominance_frontiers_1
>>
>> The additional cycles consumed by this function matches the overall
>> number of additionaly consumed cycles, and the contribution of the
>> runner up (compute_dominance_frontiers_1) is 20 times smaller, so I
>> think it's really just this one function.
>>
>> However, the generated assembly is completely identical in both cases!
> 
> Ugh.  We have seen this before :-(
> 
> Thanks for investigating  I don't consider the Power degradation as really
> caused by your patch, then.
> 
>> I saw similar situations in the past, so I tried adding a nop to
>> htab_traverse:
>>
>> --- hashtab.c
>> +++ hashtab.c
>> @@ -529,6 +529,8 @@ htab_traverse (htab, callback, info)
>>       htab_trav callback;
>>       PTR info;
>>  {
>> +  __asm__ volatile("nop\n");
>> +
>>    PTR *slot = htab->entries;
>>    PTR *limit = slot + htab->size;
>>
>> and made a 5x re-run.  The new measurements are 227.01s and 227.44s
>> (+0.19%).  With two nops I get 227.25s and 227.29s (+0.02%), which also
>> looks like noise.
>>
>> Can this be explained by some microarchitectural quirk after all?
> 
> Two frequent branch targets that get thrown into the same bin for prediction.
> Results change based on random compiler changes, ASLR settings, phase of the
> moon, how many people in your neighbourhood have had porridge for breakfast
> this morning, etc.
FWIW, I've found the hashtable code particularly vulnerable to this kind
of performance jitter.   I've long suspected it's more related to the
data locations as I can see the jitter with the same binary running
under valgrind/cachegrind control.  ASLR being the most likely culprit
in my mind.

However, in this case it seems different -- adding a NOP is changing the
instruction stream.    Could be collisions in the branch predictors or
something similar.

Ilya, can you repost the final patch?
Jeff

> 
> 
> Segher
> 

Reply via email to