On Mon, Apr 22, 2013 at 3:19 PM, Andi Kleen <a...@firstfloor.org> wrote: > Bill Schmidt <wschm...@linux.vnet.ibm.com> writes: >> >> My reason for asking involves a large heavily-threaded application that >> is improved by feedback-directed optimization on some platforms, but not >> on others. One theory is that a defective profile is generated due to >> counter dropouts from contention. I'm somewhat skeptical about this >> given that some platforms seem to do well with it, but it's possible. >> I'm hopeful that knowing why the thread-safe profiling patch wasn't >> implemented will give us more of a clue. > > Atomics are slower even single threaded. In any case you'll have a > gigantic slowdown if there is contention. Better use per thread > counters.
Actually it depends on the processor. For an example on Octeon2, the atomic addition is faster than non atomic addition as the atomic instructions work on L2 rather than working going through L1 and then to L2. Basically the atomic addition of a counter does not have to populate the L1 cache in this case. Thanks, Andrew Pinski > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only