Re: Test-lock hang (not 100% reproducible) on GNU/Linux

Pádraig Brady Sat, 24 Dec 2016 10:41:26 -0800

On 24/12/16 17:52, Bruno Haible wrote:
> Hi Pádraig,
> 
>> Wow that's much better on a 40 core system:
>>
>> Before your patch:
>> =================
>> $ time ./test-lock
>> Starting test_lock ... OK
>> Starting test_rwlock ... OK
>> Starting test_recursive_lock ... OK
>> Starting test_once ... OK
>>
>> real    1m32.547s
>> user    1m32.455s
>> sys     13m21.532s
>>
>> After your patch:
>> =================
>> $ time ./test-lock
>> Starting test_lock ... OK
>> Starting test_rwlock ... OK
>> Starting test_recursive_lock ... OK
>> Starting test_once ... OK
>>
>> real    0m3.364s
>> user    0m3.087s
>> sys     0m25.477s
> 
> Wow, a 30x speed increase by using a lock instead of 'volatile'!
> 
> Thanks for the testing. I cleaned up the patch to do less
> code duplication and pushed it.
> 
> Still, I wonder about the cause of this speed difference.
> It must be the read from the 'volatile' variable that is problematic,
> because the program writes to 'volatile' variable only 6 times in total.
> 
> What happens when a program reads from a 'volatile' variable
> at address xy in a multi-processor system? It must do a broadcast
> to all other CPUs "please flush your internal write caches", wait
> for these flushes to be completed, and then do a read at address xy.
> But the same procedure must also happen when taking a lock at
> address xy. So, where does the speed difference come from?
> The 'volatile' handling must be implemented in a terrible way;
> either GCC generates inefficient instructions? or these instructions
> are executed in a horrible way by the hardware?
> 
> What is the hardware of your 40-core machine (just for reference)?


Might be NUMA related?
CPU is Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
Attached is the output from:
  lstopo --no-legend -v -p --of png > test-lock.png

thanks again,
Pádraig

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

Reply via email to