One thing you can try is to increase the number of locks so that the 
goroutines aren't all stopped on a single lock. For example you can 
partition your stores: https://play.golang.org/p/y16yr57KcQ

type PartitionedStore struct {
stores []Store
}

func NewPartitionedStore(sz int) *PartitionedStore {
p := &PartitionedStore{
stores: make([]Store, sz),
}
for i := 0; i < sz; i++ {
p.stores[i] = *(NewStore())
}
return p
}

func (p *PartitionedStore) addSample(key string, value int64) {
p.getStore(key).addSample(key, value)
}

func (p *PartitionedStore) getStore(key string) *Store {
h := fnv.New32a() // might want to use a faster algorithm here
io.WriteString(h, key)
idx := h.Sum32() % uint32(len(p.stores))
return &p.stores[idx]
}

func main() {
p := NewPartitionedStore(32)
p.addSample("test", 15)
}


So in this case there are 32 pre-computed locks (no need for a mutex on a 
fixed-size array that doesn't change) and if the keys are uniformly 
distributed you end up reducing the chances that two goroutines hit the 
same mutex at the same time. (this approach is language agnostic, so may 
help the java program too...)

On Sunday, October 2, 2016 at 10:17:53 AM UTC-4, toefel18 wrote:
>
> Hi,
>
> I've written a small library (https://github.com/toefel18/go-patan) in 
> that stores counters and collects statistics (running 
> min/max/average/stddeviation) of a program during runtime. There is a lock 
> based implementation and a channel based implementation. I've written this 
> library before in Java as well https://github.com/toefel18/patan. The 
> Java version with equivalent implementation is much faster than both the 
> channel and locking implementations in Go, I really don't understand why. 
>
> This program has a store that holds the data, and a sync.Mutex that guards 
> concurrent access on reads and writes. This is a snippet of the locking 
> based implementation:
>
> type Store struct {
>    durations map[string]*Distribution
>    counters  map[string]int64
>    samples   map[string]*Distribution
>
>    lock *sync.Mutex
> }
>
> func (store *Store) addSample(key string, value int64) {
>    store.addToStore(store.samples, key, value)
> }
>
> func (store *Store) addDuration(key string, value int64) {
>    store.addToStore(store.durations, key, value)
> }
>
>
> func (store *Store) addToStore(destination map[string]*Distribution, key 
> string, value int64) {
>    store.lock.Lock()
>    defer store.lock.Unlock()
>    distribution, exists := destination[key]
>    if !exists {
>       distribution = NewDistribution()
>       destination[key] = distribution
>    }
>    distribution.addSample(value)
> }
>
> Now, when I benchmark this GO code, I get the following results (see gist: 
> benchmark code 
> <https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>):
>
> 10 threads with 20000 items took 133 millis 
> 100 threads with 20000 items took 1809 millis 
> 1000 threads with 20000 items took 17576 millis 
> 10 threads with 200000 items took 1228 millis 
>
> 100 threads with 200000 items took 17900 millis
>
> When I benchmark the Java code, there are much better results (see gist: java 
> benchmark code 
> <https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>)
>
> 10 threads with 20000 items takes 89 millis 
> 100 threads with 20000 items takes 265 millis 
> 1000 threads with 20000 items takes 2888 millis 
> 10 threads with 200000 items takes 311 millis 
>
> 100 threads with 200000 items takes 3067 millis
>
>
> I have profiled the Go code and created a call graph 
> <https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>.
>  I interpret this as follows:
> GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in
>  sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()  
>
> The top20 output of the profiler:
>
> (pprof) top20
> 59110ms of 73890ms total (80.00%)
> Dropped 22 nodes (cum <= 369.45ms)
> Showing top 20 nodes out of 65 (cum >= 50220ms)
>       flat  flat%   sum%        cum   cum%
>     8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
>     7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
>     7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
>     4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
>     4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
>     4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
>     3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
>     3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
>     2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
>     1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
>     1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
>     1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
>     1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
>     1250ms  1.69% 71.70%     1250ms  1.69%  
> github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
>     1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
>     1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
>     1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
>     1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
>      950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
>      860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1
>
>
> I would really like to understand why Locking in Go is so much slower than in 
> Java. I've initially written this program using channels, but that was much 
> slower than locking. Can somebody please help me out?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to