Re: Call for compiler help/advice: atomic builtins for v3

Peter Dimov Sun, 06 Nov 2005 12:32:37 -0800

Richard Henderson wrote:

To keep all this in perspective, folks should remember that atomic
operations are *slow*.  Very very slow.  Orders of magnitude slower
than function calls.  Seriously.  Taking p4 as the extreme example,
one can expect a null function call in around 10 cycles, but a locked
memory operation to take 1000.  Usually things aren't that bad, but
I believe some poor design decisions were made for p4 here.  But even
on a platform without such problems you can expect a factor of 30
difference.


Apologies in advance if the following is not relevant...

Even on a P4, inlining may enable compiler optimizations. One case is whenthe compiler can see that the return value of __sync_fetch_and_or (forinstance) isn't used. It's possible to use a wait-free "lock or" instead ofa "lock cmpxchg" loop (MSVC 8 does this for _InterlockedOr.)

Another case is when inlining results in a sequence of K adjacent__sync_fetch_and_add( &x, 1 ) operations. These can legally be replaced witha single __sync_fetch_and_add.

Currently the __sync_* intrinsics seem to be fully locked, but ifacquire/release/unordered variants are added, other platforms may alsosuffer from lack of inlining. On a PowerPC an unordered atomic increment ispretty much the same speed as an ordinary increment (when there is nocontention.)

Re: Call for compiler help/advice: atomic builtins for v3

Reply via email to