Do you get better performance when you remove the defer and do an explicit
unlock at the end of the function? There are a new references to the defer
process happening in your profile.
I'm guessing the try/finally in Java is cheaper.

On Mon, 3 Oct 2016, 3:17 AM <toefe...@gmail.com> wrote:

> Hi,
>
> I've written a small library (https://github.com/toefel18/go-patan) in
> that stores counters and collects statistics (running
> min/max/average/stddeviation) of a program during runtime. There is a lock
> based implementation and a channel based implementation. I've written this
> library before in Java as well https://github.com/toefel18/patan. The
> Java version with equivalent implementation is much faster than both the
> channel and locking implementations in Go, I really don't understand why.
>
> This program has a store that holds the data, and a sync.Mutex that guards
> concurrent access on reads and writes. This is a snippet of the locking
> based implementation:
>
> type Store struct {
>    durations map[string]*Distribution
>    counters  map[string]int64
>    samples   map[string]*Distribution
>
>    lock *sync.Mutex
> }
>
> func (store *Store) addSample(key string, value int64) {
>    store.addToStore(store.samples, key, value)
> }
>
> func (store *Store) addDuration(key string, value int64) {
>    store.addToStore(store.durations, key, value)
> }
>
>
> func (store *Store) addToStore(destination map[string]*Distribution, key 
> string, value int64) {
>    store.lock.Lock()
>    defer store.lock.Unlock()
>    distribution, exists := destination[key]
>    if !exists {
>       distribution = NewDistribution()
>       destination[key] = distribution
>    }
>    distribution.addSample(value)
> }
>
> Now, when I benchmark this GO code, I get the following results (see gist: 
> benchmark code 
> <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>):
>
> 10 threads with 20000 items took 133 millis
> 100 threads with 20000 items took 1809 millis
> 1000 threads with 20000 items took 17576 millis
> 10 threads with 200000 items took 1228 millis
>
> 100 threads with 200000 items took 17900 millis
>
> When I benchmark the Java code, there are much better results (see gist: java 
> benchmark code 
> <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>)
>
> 10 threads with 20000 items takes 89 millis
> 100 threads with 20000 items takes 265 millis
> 1000 threads with 20000 items takes 2888 millis
> 10 threads with 200000 items takes 311 millis
>
> 100 threads with 200000 items takes 3067 millis
>
>
> I have profiled the Go code and created a call graph 
> <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>.
>  I interpret this as follows:
> GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in
>  sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()
>
> The top20 output of the profiler:
>
> (pprof) top20
> 59110ms of 73890ms total (80.00%)
> Dropped 22 nodes (cum <= 369.45ms)
> Showing top 20 nodes out of 65 (cum >= 50220ms)
>       flat  flat%   sum%        cum   cum%
>     8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
>     7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
>     7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
>     4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
>     4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
>     4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
>     3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
>     3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
>     2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
>     1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
>     1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
>     1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
>     1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
>     1250ms  1.69% 71.70%     1250ms  1.69%  
> github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
>     1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
>     1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
>     1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
>     1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
>      950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
>      860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1
>
>
> I would really like to understand why Locking in Go is so much slower than in 
> Java. I've initially written this program using channels, but that was much 
> slower than locking. Can somebody please help me out?
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to