http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
--- Comment #4 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-08-02 14:33:19 UTC --- One more data point. In a micro-benchmark which uses realistic code used in production the change from __sync_sub_and_fetch(var, constant) to __sync_add_and_fetch(var, -constant) lead to a 10% to 27% improvement in performance. The cmpxchg use with the necessary initial load and I->S cache transition really kills performance when memory is highly contested.