Re: Test-lock hang (not 100% reproducible) on GNU/Linux

Bruno Haible Sat, 24 Dec 2016 09:53:09 -0800

Hi Pádraig,

> Wow that's much better on a 40 core system:
> 
> Before your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
> 
> real    1m32.547s
> user    1m32.455s
> sys     13m21.532s
> 
> After your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
> 
> real    0m3.364s
> user    0m3.087s
> sys     0m25.477s


Wow, a 30x speed increase by using a lock instead of 'volatile'!

Thanks for the testing. I cleaned up the patch to do less
code duplication and pushed it.

Still, I wonder about the cause of this speed difference.
It must be the read from the 'volatile' variable that is problematic,
because the program writes to 'volatile' variable only 6 times in total.

What happens when a program reads from a 'volatile' variable
at address xy in a multi-processor system? It must do a broadcast
to all other CPUs "please flush your internal write caches", wait
for these flushes to be completed, and then do a read at address xy.
But the same procedure must also happen when taking a lock at
address xy. So, where does the speed difference come from?
The 'volatile' handling must be implemented in a terrible way;
either GCC generates inefficient instructions? or these instructions
are executed in a horrible way by the hardware?

What is the hardware of your 40-core machine (just for reference)?

Bruno

0001-lock-test-Fix-performance-problem-on-multi-core-mach.patch
Description: Binary data

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

Reply via email to