Not only that defer but also the ones in the benchmark code itself.

See this:

diff --git a/gopatanbench/benchmark.go b/gopatanbench/benchmark.go
index 23503a9..e92ed88 100644
--- a/gopatanbench/benchmark.go
+++ b/gopatanbench/benchmark.go
@@ -37,13 +37,13 @@ func Benchmrk(threads int64, itemsPerThread int64) {
     for i := int64(0); i < threads; i++ {
         wg.Add(1)
         go func() {
-            defer wg.Done()
             sw := subject.StartStopwatch()
-            defer subject.RecordElapsedTime("goroutine.duration", sw)
             for i := int64(0); i < itemsPerThread; i++ {
                 subject.IncrementCounter("concurrency.counter")
                 subject.AddSample("concurrency.sample", i)
             }
+            subject.RecordElapsedTime("goroutine.duration", sw)
+            wg.Done()
         }()
     }
     wg.Wait()


Which on my machine does the following:

- original:
2016/10/03 10:21:06 [STATISTICS] created new lockbased store
10 threads with 20000 items took 227
2016/10/03 10:21:06 [STATISTICS] created new lockbased store
100 threads with 20000 items took 2416
2016/10/03 10:21:09 [STATISTICS] created new lockbased store
1000 threads with 20000 items took 23095
2016/10/03 10:21:32 [STATISTICS] created new lockbased store
10 threads with 200000 items took 2088
2016/10/03 10:21:34 [STATISTICS] created new lockbased store
100 threads with 200000 items took 24436

- with patch:
2016/10/03 10:19:37 [STATISTICS] created new lockbased store
10 threads with 20000 items took 212
2016/10/03 10:19:37 [STATISTICS] created new lockbased store
100 threads with 20000 items took 2295
2016/10/03 10:19:39 [STATISTICS] created new lockbased store
1000 threads with 20000 items took 22677
2016/10/03 10:20:02 [STATISTICS] created new lockbased store
10 threads with 200000 items took 2011
2016/10/03 10:20:04 [STATISTICS] created new lockbased store
100 threads with 200000 items took 23322

If the benchmark code is slow it will also slow down the app itself.
Also I'd suggest running the benchmarks using the builtin benchmark 
capability of Go.



On Sunday, October 2, 2016 at 7:33:15 PM UTC+1, Justin Israel wrote:
>
> Do you get better performance when you remove the defer and do an explicit 
> unlock at the end of the function? There are a new references to the defer 
> process happening in your profile. 
> I'm guessing the try/finally in Java is cheaper. 
>
> On Mon, 3 Oct 2016, 3:17 AM <toef...@gmail.com <javascript:>> wrote:
>
>> Hi,
>>
>> I've written a small library (https://github.com/toefel18/go-patan) in 
>> that stores counters and collects statistics (running 
>> min/max/average/stddeviation) of a program during runtime. There is a lock 
>> based implementation and a channel based implementation. I've written this 
>> library before in Java as well https://github.com/toefel18/patan. The 
>> Java version with equivalent implementation is much faster than both the 
>> channel and locking implementations in Go, I really don't understand why. 
>>
>> This program has a store that holds the data, and a sync.Mutex that 
>> guards concurrent access on reads and writes. This is a snippet of the 
>> locking based implementation:
>>
>> type Store struct {
>>    durations map[string]*Distribution
>>    counters  map[string]int64
>>    samples   map[string]*Distribution
>>
>>    lock *sync.Mutex
>> }
>>
>> func (store *Store) addSample(key string, value int64) {
>>    store.addToStore(store.samples, key, value)
>> }
>>
>> func (store *Store) addDuration(key string, value int64) {
>>    store.addToStore(store.durations, key, value)
>> }
>>
>>
>> func (store *Store) addToStore(destination map[string]*Distribution, key 
>> string, value int64) {
>>    store.lock.Lock()
>>    defer store.lock.Unlock()
>>    distribution, exists := destination[key]
>>    if !exists {
>>       distribution = NewDistribution()
>>       destination[key] = distribution
>>    }
>>    distribution.addSample(value)
>> }
>>
>> Now, when I benchmark this GO code, I get the following results (see gist: 
>> benchmark code 
>> <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>):
>>
>> 10 threads with 20000 items took 133 millis 
>> 100 threads with 20000 items took 1809 millis 
>> 1000 threads with 20000 items took 17576 millis 
>> 10 threads with 200000 items took 1228 millis 
>>
>> 100 threads with 200000 items took 17900 millis
>>
>> When I benchmark the Java code, there are much better results (see gist: 
>> java benchmark code 
>> <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>)
>>
>> 10 threads with 20000 items takes 89 millis 
>> 100 threads with 20000 items takes 265 millis 
>> 1000 threads with 20000 items takes 2888 millis 
>> 10 threads with 200000 items takes 311 millis 
>>
>> 100 threads with 200000 items takes 3067 millis
>>
>>
>> I have profiled the Go code and created a call graph 
>> <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>.
>>  I interpret this as follows:
>> GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in
>>  sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()  
>>
>> The top20 output of the profiler:
>>
>> (pprof) top20
>> 59110ms of 73890ms total (80.00%)
>> Dropped 22 nodes (cum <= 369.45ms)
>> Showing top 20 nodes out of 65 (cum >= 50220ms)
>>       flat  flat%   sum%        cum   cum%
>>     8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
>>     7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
>>     7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
>>     4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
>>     4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
>>     4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
>>     3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
>>     3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
>>     2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
>>     1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
>>     1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
>>     1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
>>     1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
>>     1250ms  1.69% 71.70%     1250ms  1.69%  
>> github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
>>     1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
>>     1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
>>     1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
>>     1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
>>      950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
>>      860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1
>>
>>
>> I would really like to understand why Locking in Go is so much slower than 
>> in Java. I've initially written this program using channels, but that was 
>> much slower than locking. Can somebody please help me out?
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to