Thanks for the tip, I rewrote it using explicit locking and it indeed 
results in much better performance, still far from Java. but the next 
response gives a good explanation why that can happen.


On Sunday, October 2, 2016 at 8:33:15 PM UTC+2, Justin Israel wrote:
>
> Do you get better performance when you remove the defer and do an explicit 
> unlock at the end of the function? There are a new references to the defer 
> process happening in your profile. 
> I'm guessing the try/finally in Java is cheaper. 
>
> On Mon, 3 Oct 2016, 3:17 AM <toef...@gmail.com <javascript:>> wrote:
>
>> Hi,
>>
>> I've written a small library (https://github.com/toefel18/go-patan) in 
>> that stores counters and collects statistics (running 
>> min/max/average/stddeviation) of a program during runtime. There is a lock 
>> based implementation and a channel based implementation. I've written this 
>> library before in Java as well https://github.com/toefel18/patan. The 
>> Java version with equivalent implementation is much faster than both the 
>> channel and locking implementations in Go, I really don't understand why. 
>>
>> This program has a store that holds the data, and a sync.Mutex that 
>> guards concurrent access on reads and writes. This is a snippet of the 
>> locking based implementation:
>>
>> type Store struct {
>>    durations map[string]*Distribution
>>    counters  map[string]int64
>>    samples   map[string]*Distribution
>>
>>    lock *sync.Mutex
>> }
>>
>> func (store *Store) addSample(key string, value int64) {
>>    store.addToStore(store.samples, key, value)
>> }
>>
>> func (store *Store) addDuration(key string, value int64) {
>>    store.addToStore(store.durations, key, value)
>> }
>>
>>
>> func (store *Store) addToStore(destination map[string]*Distribution, key 
>> string, value int64) {
>>    store.lock.Lock()
>>    defer store.lock.Unlock()
>>    distribution, exists := destination[key]
>>    if !exists {
>>       distribution = NewDistribution()
>>       destination[key] = distribution
>>    }
>>    distribution.addSample(value)
>> }
>>
>> Now, when I benchmark this GO code, I get the following results (see gist: 
>> benchmark code 
>> <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>):
>>
>> 10 threads with 20000 items took 133 millis 
>> 100 threads with 20000 items took 1809 millis 
>> 1000 threads with 20000 items took 17576 millis 
>> 10 threads with 200000 items took 1228 millis 
>>
>> 100 threads with 200000 items took 17900 millis
>>
>> When I benchmark the Java code, there are much better results (see gist: 
>> java benchmark code 
>> <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>)
>>
>> 10 threads with 20000 items takes 89 millis 
>> 100 threads with 20000 items takes 265 millis 
>> 1000 threads with 20000 items takes 2888 millis 
>> 10 threads with 200000 items takes 311 millis 
>>
>> 100 threads with 200000 items takes 3067 millis
>>
>>
>> I have profiled the Go code and created a call graph 
>> <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>.
>>  I interpret this as follows:
>> GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in
>>  sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()  
>>
>> The top20 output of the profiler:
>>
>> (pprof) top20
>> 59110ms of 73890ms total (80.00%)
>> Dropped 22 nodes (cum <= 369.45ms)
>> Showing top 20 nodes out of 65 (cum >= 50220ms)
>>       flat  flat%   sum%        cum   cum%
>>     8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
>>     7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
>>     7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
>>     4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
>>     4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
>>     4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
>>     3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
>>     3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
>>     2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
>>     1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
>>     1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
>>     1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
>>     1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
>>     1250ms  1.69% 71.70%     1250ms  1.69%  
>> github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
>>     1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
>>     1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
>>     1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
>>     1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
>>      950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
>>      860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1
>>
>>
>> I would really like to understand why Locking in Go is so much slower than 
>> in Java. I've initially written this program using channels, but that was 
>> much slower than locking. Can somebody please help me out?
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to