On Fri, Dec 21, 2012 at 10:13 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> On Thu, Dec 20, 2012 at 8:20 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> >> On Wed, Dec 19, 2012 at 4:29 PM, Andrew Pinski <pins...@gmail.com> wrote: >> >> > >> >> > On Wed, Dec 19, 2012 at 12:08 PM, Rong Xu <x...@google.com> wrote: >> >> > > Hi, >> >> > > >> >> > > This patch adds the supprot of atomic update the profile counters. >> >> > > Tested with google internal benchmarks and fdo kernel build. >> >> > >> >> > I think you should use the __atomic_ functions instead of __sync_ >> >> > functions as they allow better performance for simple counters as you >> >> > can use __ATOMIC_RELAXED. >> >> >> >> You are right. I think __ATOMIC_RELAXED should be OK here. >> >> Thanks for the suggestion. >> >> >> >> > >> >> > And this would be useful for the trunk also. I was going to implement >> >> > this exact thing this week but some other important stuff came up. >> >> >> >> I'll post trunk patch later. >> > >> > Yes, I like that patch, too. Even if the costs are quite high (and this is >> > why >> > atomic updates was sort of voted down in the past) the alternative of >> > using TLS >> > has problems with too-much per-thread memory. >> >> Actually sometimes (on some processors) atomic increments are cheaper >> than doing a regular incremental. Mainly because there is an >> instruction which can handle it in the L2 cache rather than populating >> the L1. Octeon is one such processor where this is true. > > One reason for large divergence may be the fact that we optimize the counter > update code. Perhaps declaring counters volatile will prevent load/store > motion > and reduce the racing, too.
Well, that will make it slower, too. The best benchmark to check is tramp3d for all this stuff. I remember that ICC when it had a function call for each counter update was about 100000x slower instrumented than w/o instrumentation (that is, I never waited long enough to make it finish even one iteration ...) Thus, it's very important that counter updates are subject to loop invariant / store motion (and SCEV const-prop)! GCC does a wonderful job here at the moment, please do not regress here. Richard. > Honza >> >> Thanks, >> Andrew Pinski >> >> > >> > While there are even more alternatives, like recording the changes and >> > commmiting them in blocks (say at function return), I guess some solution >> > is >> > better than no solution. >> > >> > Thanks, >> > Honza