Thanks for the tip, I rewrote it using explicit locking and it indeed results in much better performance, still far from Java. but the next response gives a good explanation why that can happen.
On Sunday, October 2, 2016 at 8:33:15 PM UTC+2, Justin Israel wrote: > > Do you get better performance when you remove the defer and do an explicit > unlock at the end of the function? There are a new references to the defer > process happening in your profile. > I'm guessing the try/finally in Java is cheaper. > > On Mon, 3 Oct 2016, 3:17 AM <toef...@gmail.com <javascript:>> wrote: > >> Hi, >> >> I've written a small library (https://github.com/toefel18/go-patan) in >> that stores counters and collects statistics (running >> min/max/average/stddeviation) of a program during runtime. There is a lock >> based implementation and a channel based implementation. I've written this >> library before in Java as well https://github.com/toefel18/patan. The >> Java version with equivalent implementation is much faster than both the >> channel and locking implementations in Go, I really don't understand why. >> >> This program has a store that holds the data, and a sync.Mutex that >> guards concurrent access on reads and writes. This is a snippet of the >> locking based implementation: >> >> type Store struct { >> durations map[string]*Distribution >> counters map[string]int64 >> samples map[string]*Distribution >> >> lock *sync.Mutex >> } >> >> func (store *Store) addSample(key string, value int64) { >> store.addToStore(store.samples, key, value) >> } >> >> func (store *Store) addDuration(key string, value int64) { >> store.addToStore(store.durations, key, value) >> } >> >> >> func (store *Store) addToStore(destination map[string]*Distribution, key >> string, value int64) { >> store.lock.Lock() >> defer store.lock.Unlock() >> distribution, exists := destination[key] >> if !exists { >> distribution = NewDistribution() >> destination[key] = distribution >> } >> distribution.addSample(value) >> } >> >> Now, when I benchmark this GO code, I get the following results (see gist: >> benchmark code >> <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>): >> >> 10 threads with 20000 items took 133 millis >> 100 threads with 20000 items took 1809 millis >> 1000 threads with 20000 items took 17576 millis >> 10 threads with 200000 items took 1228 millis >> >> 100 threads with 200000 items took 17900 millis >> >> When I benchmark the Java code, there are much better results (see gist: >> java benchmark code >> <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>) >> >> 10 threads with 20000 items takes 89 millis >> 100 threads with 20000 items takes 265 millis >> 1000 threads with 20000 items takes 2888 millis >> 10 threads with 200000 items takes 311 millis >> >> 100 threads with 200000 items takes 3067 millis >> >> >> I have profiled the Go code and created a call graph >> <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>. >> I interpret this as follows: >> GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in >> sync.(*Mutex).Lock() and sync.(*Mutex).Unlock() >> >> The top20 output of the profiler: >> >> (pprof) top20 >> 59110ms of 73890ms total (80.00%) >> Dropped 22 nodes (cum <= 369.45ms) >> Showing top 20 nodes out of 65 (cum >= 50220ms) >> flat flat% sum% cum cum% >> 8900ms 12.04% 12.04% 8900ms 12.04% runtime.futex >> 7270ms 9.84% 21.88% 7270ms 9.84% runtime/internal/atomic.Xchg >> 7020ms 9.50% 31.38% 7020ms 9.50% runtime.procyield >> 4560ms 6.17% 37.56% 4560ms 6.17% sync/atomic.CompareAndSwapUint32 >> 4400ms 5.95% 43.51% 4400ms 5.95% runtime/internal/atomic.Xadd >> 4210ms 5.70% 49.21% 22040ms 29.83% runtime.lock >> 3650ms 4.94% 54.15% 3650ms 4.94% runtime/internal/atomic.Cas >> 3260ms 4.41% 58.56% 3260ms 4.41% runtime/internal/atomic.Load >> 2220ms 3.00% 61.56% 22810ms 30.87% sync.(*Mutex).Lock >> 1870ms 2.53% 64.10% 1870ms 2.53% runtime.osyield >> 1540ms 2.08% 66.18% 16740ms 22.66% runtime.findrunnable >> 1430ms 1.94% 68.11% 1430ms 1.94% runtime.freedefer >> 1400ms 1.89% 70.01% 1400ms 1.89% sync/atomic.AddUint32 >> 1250ms 1.69% 71.70% 1250ms 1.69% >> github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample >> 1240ms 1.68% 73.38% 3140ms 4.25% runtime.deferreturn >> 1070ms 1.45% 74.83% 6520ms 8.82% runtime.systemstack >> 1010ms 1.37% 76.19% 1010ms 1.37% runtime.newdefer >> 1000ms 1.35% 77.55% 1000ms 1.35% runtime.mapaccess1_faststr >> 950ms 1.29% 78.83% 15660ms 21.19% runtime.semacquire >> 860ms 1.16% 80.00% 50220ms 67.97% main.Benchmrk.func1 >> >> >> I would really like to understand why Locking in Go is so much slower than >> in Java. I've initially written this program using channels, but that was >> much slower than locking. Can somebody please help me out? >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.