Do you get better performance when you remove the defer and do an explicit unlock at the end of the function? There are a new references to the defer process happening in your profile. I'm guessing the try/finally in Java is cheaper.
On Mon, 3 Oct 2016, 3:17 AM <toefe...@gmail.com> wrote: > Hi, > > I've written a small library (https://github.com/toefel18/go-patan) in > that stores counters and collects statistics (running > min/max/average/stddeviation) of a program during runtime. There is a lock > based implementation and a channel based implementation. I've written this > library before in Java as well https://github.com/toefel18/patan. The > Java version with equivalent implementation is much faster than both the > channel and locking implementations in Go, I really don't understand why. > > This program has a store that holds the data, and a sync.Mutex that guards > concurrent access on reads and writes. This is a snippet of the locking > based implementation: > > type Store struct { > durations map[string]*Distribution > counters map[string]int64 > samples map[string]*Distribution > > lock *sync.Mutex > } > > func (store *Store) addSample(key string, value int64) { > store.addToStore(store.samples, key, value) > } > > func (store *Store) addDuration(key string, value int64) { > store.addToStore(store.durations, key, value) > } > > > func (store *Store) addToStore(destination map[string]*Distribution, key > string, value int64) { > store.lock.Lock() > defer store.lock.Unlock() > distribution, exists := destination[key] > if !exists { > distribution = NewDistribution() > destination[key] = distribution > } > distribution.addSample(value) > } > > Now, when I benchmark this GO code, I get the following results (see gist: > benchmark code > <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>): > > 10 threads with 20000 items took 133 millis > 100 threads with 20000 items took 1809 millis > 1000 threads with 20000 items took 17576 millis > 10 threads with 200000 items took 1228 millis > > 100 threads with 200000 items took 17900 millis > > When I benchmark the Java code, there are much better results (see gist: java > benchmark code > <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>) > > 10 threads with 20000 items takes 89 millis > 100 threads with 20000 items takes 265 millis > 1000 threads with 20000 items takes 2888 millis > 10 threads with 200000 items takes 311 millis > > 100 threads with 200000 items takes 3067 millis > > > I have profiled the Go code and created a call graph > <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>. > I interpret this as follows: > GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in > sync.(*Mutex).Lock() and sync.(*Mutex).Unlock() > > The top20 output of the profiler: > > (pprof) top20 > 59110ms of 73890ms total (80.00%) > Dropped 22 nodes (cum <= 369.45ms) > Showing top 20 nodes out of 65 (cum >= 50220ms) > flat flat% sum% cum cum% > 8900ms 12.04% 12.04% 8900ms 12.04% runtime.futex > 7270ms 9.84% 21.88% 7270ms 9.84% runtime/internal/atomic.Xchg > 7020ms 9.50% 31.38% 7020ms 9.50% runtime.procyield > 4560ms 6.17% 37.56% 4560ms 6.17% sync/atomic.CompareAndSwapUint32 > 4400ms 5.95% 43.51% 4400ms 5.95% runtime/internal/atomic.Xadd > 4210ms 5.70% 49.21% 22040ms 29.83% runtime.lock > 3650ms 4.94% 54.15% 3650ms 4.94% runtime/internal/atomic.Cas > 3260ms 4.41% 58.56% 3260ms 4.41% runtime/internal/atomic.Load > 2220ms 3.00% 61.56% 22810ms 30.87% sync.(*Mutex).Lock > 1870ms 2.53% 64.10% 1870ms 2.53% runtime.osyield > 1540ms 2.08% 66.18% 16740ms 22.66% runtime.findrunnable > 1430ms 1.94% 68.11% 1430ms 1.94% runtime.freedefer > 1400ms 1.89% 70.01% 1400ms 1.89% sync/atomic.AddUint32 > 1250ms 1.69% 71.70% 1250ms 1.69% > github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample > 1240ms 1.68% 73.38% 3140ms 4.25% runtime.deferreturn > 1070ms 1.45% 74.83% 6520ms 8.82% runtime.systemstack > 1010ms 1.37% 76.19% 1010ms 1.37% runtime.newdefer > 1000ms 1.35% 77.55% 1000ms 1.35% runtime.mapaccess1_faststr > 950ms 1.29% 78.83% 15660ms 21.19% runtime.semacquire > 860ms 1.16% 80.00% 50220ms 67.97% main.Benchmrk.func1 > > > I would really like to understand why Locking in Go is so much slower than in > Java. I've initially written this program using channels, but that was much > slower than locking. Can somebody please help me out? > > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.