Re: [go-nuts] Re: Can't explain Golang benchmarks

Michael Jones Mon, 17 Jul 2017 13:15:12 -0700

I get about 3 million channel send/receive pairs per second. If you
actually do work inbeteeen (database access, computation, etc.) the limit
is difficult to reach.


On Mon, Jul 17, 2017 at 10:22 AM Jesper Louis Andersen <
jesper.louis.ander...@gmail.com> wrote:

> Your benchmarks are not really *doing* something once they get data. This
> means that eventual conflicts on (internal or external) locks are going to
> be hit all the time. In turn, you are likely to measure the performance of
> your hardware in the event of excessive lock contention (this hypothesis
> can be verified: block profiling).
>
> Were you to process something before going back to the barrier or channel,
> then you are less likely to hit the problem. Though you should be aware
> sometimes your cores will coordinate (because the work they do will tend
> them to come back to the lock at the same point in time - a similar problem
> to TCP incast).
>
> As you go from a single core to multiple cores, an important event happens
> in the system. In a single-core system, you don't need the hardware to
> coordinate. This is much faster than the case where the hardware might need
> to tell other physical cores about what is going on. Especially under
> contention. In short, there is a gap between the single-core system and a
> multicore system due to communication overhead. In high-performance
> computing a common mistake is to measure the *same* implementation in the
> serial and parallel case, even though the serial case can be made to avoid
> taking locks and so on. A fair benchmark would compare the fastest
> single-core implementation with the fastest multi-core ditto.
>
> The upshot of having access to multiple cores is that you can get them all
> to do work and this can lead to a substantial speedup. But there has to be
> work to do. Your benchmark has almost no work to do, so most of the cores
> are just competing against each other.
>
> For a PubSub implementation you need to think about the effort involved in
> the implementation. Up to a point, I think a channel-based solution, or a
> barrier based solution is adequate. For systems where you need low latency,
> go for a pattern like the Disruptor pattern[0], which should be doable in
> Go. This pattern works by a ring-buffer of events and two types of "clock
> hands". One hand is the point of insertion of a new event, the other
> hand(s) track where the readers currently are in their processing. By
> tracking the hands through atomics, you can get messaging overhead down by
> a lot. The trick is that each thread writes to its own hand and other
> threads can read an eventually consistent snapshot. As long as the buffer
> doesn't fill up completely, the eventually consistent view of an ever
> increasing counter is enough. Getting an older value is not a big deal, as
> long as the value isn't undefined.
>
> The problem with channels are that they are, inherently, one-shot. If a
> message is consumed, it is gone from the channel. In a PubSub solution, you
> really want multiple readers process the same (immutable) data and flag it
> when it is all consumed. Message brokers and SMTP servers often optimize
> delivery of this kind by keeping the payload separate from the header
> (envelope): better not pay for the contents more than once. But you still
> have to send a message to each subscriber, and this gets rather expensive.
> The Disruptor pattern solves this.
>
> On the other hand, the Disruptor only works if you happen to have memory
> shared. In large distributed systems, the pattern tend to break down, and
> other methods must be used. For instance a cluster of disruptors. But then
> keeping consistency in check becomes a challenge of the fun kind :)
>
> [0] http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf -
> Thompson, Farley, Barker, Gee, Steward, 2011
>
> On Mon, Jul 17, 2017 at 7:46 AM Zohaib Sibte Hassan <
> zohaib.has...@gmail.com> wrote:
>
>> Awesome :) got similar results. So channels are slower for pubsub. I am
>> surprised so far, with the efficiency going down as the cores go up (heck
>> reminds me of that Node.js single core meme). Please tell me I am wrong
>> here, is there any other more efficient approach?
>>
>> On Sunday, July 16, 2017 at 8:06:44 PM UTC-7, peterGo wrote:
>>>
>>> Your latest benchmarks are invalid.
>>>
>>> In your benchmarks, replace b.StartTimer() with b.ResetTimer().
>>>
>>> Simply run benchmarks. Don't run race detectors. Don't run profilers.
>>> For example,
>>>
>>> $ go version
>>> go version devel +504deee Sun Jul 16 03:57:11 2017 +0000 linux/amd64
>>> $ go test -run=! -bench=. -benchmem -cpu=1,2,4,8 pubsub_test.go
>>> goos: linux
>>> goarch: amd64
>>> BenchmarkPubSubPrimitiveChannelsMultiple            20000         86328
>>> ns/op          61 B/op           2 allocs/op
>>> BenchmarkPubSubPrimitiveChannelsMultiple-2          30000         50844
>>> ns/op          54 B/op           2 allocs/op
>>> BenchmarkPubSubPrimitiveChannelsMultiple-4          10000        112833
>>> ns/op          83 B/op           2 allocs/op
>>> BenchmarkPubSubPrimitiveChannelsMultiple-8          10000        160011
>>> ns/op          88 B/op           2 allocs/op
>>> BenchmarkPubSubWaitGroupMultiple                   100000         21231
>>> ns/op          40 B/op           2 allocs/op
>>> BenchmarkPubSubWaitGroupMultiple-2                  10000        107165
>>> ns/op          46 B/op           2 allocs/op
>>> BenchmarkPubSubWaitGroupMultiple-4                  20000         73235
>>> ns/op          43 B/op           2 allocs/op
>>> BenchmarkPubSubWaitGroupMultiple-8                  20000         82917
>>> ns/op          42 B/op           2 allocs/op
>>> PASS
>>> ok      command-line-arguments    15.481s
>>> $
>>>
>>> Peter
>>>
>>> On Sunday, July 16, 2017 at 9:51:38 PM UTC-4, Zohaib Sibte Hassan wrote:
>>>>
>>>> Thanks for pointing issues out I updated my code to get rid of race
>>>> conditions (nothing critical I was always doing reader-writer race). Anyhow
>>>> I updated my code on
>>>> https://gist.github.com/maxpert/f3c405c516ba2d4c8aa8b0695e0e054e.
>>>> Still doesn't explain the new results:
>>>>
>>>> $> go test -race -run=! -bench=. -benchmem -cpu=1,2,4,8
>>>> -cpuprofile=cpu.out -memprofile=mem.out pubsub_test.go
>>>> BenchmarkPubSubPrimitiveChannelsMultiple          50  21121694 ns/op
>>>>  8515 B/op      39 allocs/op
>>>> BenchmarkPubSubPrimitiveChannelsMultiple-2       100  19302372 ns/op
>>>>  4277 B/op      20 allocs/op
>>>> BenchmarkPubSubPrimitiveChannelsMultiple-4        50  22674769 ns/op
>>>>  8182 B/op      35 allocs/op
>>>> BenchmarkPubSubPrimitiveChannelsMultiple-8        50  21201533 ns/op
>>>>  8469 B/op      38 allocs/op
>>>> BenchmarkPubSubWaitGroupMultiple                3000    501804 ns/op
>>>>    63 B/op       2 allocs/op
>>>> BenchmarkPubSubWaitGroupMultiple-2               200  15417944 ns/op
>>>>   407 B/op       6 allocs/op
>>>> BenchmarkPubSubWaitGroupMultiple-4               300   5010273 ns/op
>>>>   231 B/op       4 allocs/op
>>>> BenchmarkPubSubWaitGroupMultiple-8               200   5444634 ns/op
>>>>   334 B/op       5 allocs/op
>>>> PASS
>>>> ok   command-line-arguments 21.775s
>>>>
>>>> So far my testing shows channels are slower for pubsub scenario. I
>>>> tried looking into pprof dumps of memory and CPU and it's not making sense
>>>> to me. What am I missing here?
>>>>
>>>> On Sunday, July 16, 2017 at 10:27:04 AM UTC-7, peterGo wrote:
>>>>>
>>>>> When you have data races the results are undefined.
>>>>>
>>>>> $ go version
>>>>> go version devel +dd81c37 Sat Jul 15 05:43:45 2017 +0000 linux/amd64
>>>>> $ go test -race -run=! -bench=. -benchmem -cpu=1,2,4,8 pubsub_test.go
>>>>> ==================
>>>>> WARNING: DATA RACE
>>>>> Read at 0x00c4200140c0 by goroutine 18:
>>>>>   command-line-arguments.BenchmarkPubSubPrimitiveChannelsMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:59 +0x51d
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>>
>>>>> Previous write at 0x00c4200140c0 by goroutine 57:
>>>>>   [failed to restore the stack]
>>>>>
>>>>> Goroutine 18 (running) created at:
>>>>>   testing.(*B).run1()
>>>>>       /home/peter/go/src/testing/benchmark.go:207 +0x8c
>>>>>   testing.(*B).Run()
>>>>>       /home/peter/go/src/testing/benchmark.go:513 +0x482
>>>>>   testing.runBenchmarks.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:417 +0xa7
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.runBenchmarks()
>>>>>       /home/peter/go/src/testing/benchmark.go:423 +0x86d
>>>>>   testing.(*M).Run()
>>>>>       /home/peter/go/src/testing/testing.go:928 +0x51e
>>>>>   main.main()
>>>>>       command-line-arguments/_test/_testmain.go:46 +0x1d3
>>>>>
>>>>> Goroutine 57 (finished) created at:
>>>>>   command-line-arguments.BenchmarkPubSubPrimitiveChannelsMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:40 +0x290
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>> ==================
>>>>> --- FAIL: BenchmarkPubSubPrimitiveChannelsMultiple
>>>>>     benchmark.go:147: race detected during execution of benchmark
>>>>> ==================
>>>>> WARNING: DATA RACE
>>>>> Read at 0x00c42000c030 by goroutine 1079:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:76 +0x9e
>>>>>
>>>>> Previous write at 0x00c42000c030 by goroutine 7:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:101 +0x475
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>>
>>>>> Goroutine 1079 (running) created at:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>>
>>>>> Goroutine 7 (running) created at:
>>>>>   testing.(*B).run1()
>>>>>       /home/peter/go/src/testing/benchmark.go:207 +0x8c
>>>>>   testing.(*B).Run()
>>>>>       /home/peter/go/src/testing/benchmark.go:513 +0x482
>>>>>   testing.runBenchmarks.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:417 +0xa7
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.runBenchmarks()
>>>>>       /home/peter/go/src/testing/benchmark.go:423 +0x86d
>>>>>   testing.(*M).Run()
>>>>>       /home/peter/go/src/testing/testing.go:928 +0x51e
>>>>>   main.main()
>>>>>       command-line-arguments/_test/_testmain.go:46 +0x1d3
>>>>> ==================
>>>>> ==================
>>>>> WARNING: DATA RACE
>>>>> Write at 0x00c42000c030 by goroutine 7:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:101 +0x475
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>>
>>>>> Previous read at 0x00c42000c030 by goroutine 1078:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:76 +0x9e
>>>>>
>>>>> Goroutine 7 (running) created at:
>>>>>   testing.(*B).run1()
>>>>>       /home/peter/go/src/testing/benchmark.go:207 +0x8c
>>>>>   testing.(*B).Run()
>>>>>       /home/peter/go/src/testing/benchmark.go:513 +0x482
>>>>>   testing.runBenchmarks.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:417 +0xa7
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.runBenchmarks()
>>>>>       /home/peter/go/src/testing/benchmark.go:423 +0x86d
>>>>>   testing.(*M).Run()
>>>>>       /home/peter/go/src/testing/testing.go:928 +0x51e
>>>>>   main.main()
>>>>>       command-line-arguments/_test/_testmain.go:46 +0x1d3
>>>>>
>>>>> Goroutine 1078 (running) created at:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>> ==================
>>>>> ==================
>>>>> WARNING: DATA RACE
>>>>> Read at 0x00c4200140c8 by goroutine 7:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:109 +0x51d
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>>
>>>>> Previous write at 0x00c4200140c8 by goroutine 175:
>>>>>   sync/atomic.AddInt64()
>>>>>       /home/peter/go/src/runtime/race_amd64.s:276 +0xb
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:88 +0x19a
>>>>>
>>>>> Goroutine 7 (running) created at:
>>>>>   testing.(*B).run1()
>>>>>       /home/peter/go/src/testing/benchmark.go:207 +0x8c
>>>>>   testing.(*B).Run()
>>>>>       /home/peter/go/src/testing/benchmark.go:513 +0x482
>>>>>   testing.runBenchmarks.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:417 +0xa7
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.runBenchmarks()
>>>>>       /home/peter/go/src/testing/benchmark.go:423 +0x86d
>>>>>   testing.(*M).Run()
>>>>>       /home/peter/go/src/testing/testing.go:928 +0x51e
>>>>>   main.main()
>>>>>       command-line-arguments/_test/_testmain.go:46 +0x1d3
>>>>>
>>>>> Goroutine 175 (finished) created at:
>>>>>   command-line-arguments.BenchmarkPubSubWaitGroupMultiple()
>>>>>       /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6
>>>>>   testing.(*B).runN()
>>>>>       /home/peter/go/src/testing/benchmark.go:141 +0x12a
>>>>>   testing.(*B).run1.func1()
>>>>>       /home/peter/go/src/testing/benchmark.go:214 +0x6b
>>>>> ==================
>>>>> --- FAIL: BenchmarkPubSubWaitGroupMultiple
>>>>>     benchmark.go:147: race detected during execution of benchmark
>>>>> FAIL
>>>>> exit status 1
>>>>> FAIL    command-line-arguments    0.726s
>>>>> $
>>>>>
>>>>> Peter
>>>>>
>>>>> On Sunday, July 16, 2017 at 10:20:21 AM UTC-4, Zohaib Sibte Hassan
>>>>> wrote:
>>>>>>
>>>>>> I have been spending my day over implementing an efficient PubSub
>>>>>> system. I had implemented one before using channels, and I wanted to
>>>>>> benchmark that against sync.Cond. Here is the quick and dirty test that I
>>>>>> put together
>>>>>> https://gist.github.com/maxpert/f3c405c516ba2d4c8aa8b0695e0e054e.
>>>>>> Now my confusion starts when I change GOMAXPROCS to test how it would
>>>>>> perform on my age old Raspberry Pi. Here are results:
>>>>>>
>>>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=8 go
>>>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out
>>>>>> -benchmem
>>>>>> BenchmarkPubSubPrimitiveChannelsMultiple-8     10000    165419 ns/op
>>>>>>    92 B/op       2 allocs/op
>>>>>> BenchmarkPubSubWaitGroupMultiple-8             10000    204685 ns/op
>>>>>>    53 B/op       2 allocs/op
>>>>>> PASS
>>>>>> ok   sibte.so/rascore 3.749s
>>>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=4 go
>>>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out
>>>>>> -benchmem
>>>>>> BenchmarkPubSubPrimitiveChannelsMultiple-4     20000    101704 ns/op
>>>>>>    60 B/op       2 allocs/op
>>>>>> BenchmarkPubSubWaitGroupMultiple-4             10000    204039 ns/op
>>>>>>    52 B/op       2 allocs/op
>>>>>> PASS
>>>>>> ok   sibte.so/rascore 5.087s
>>>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=2 go
>>>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out
>>>>>> -benchmem
>>>>>> BenchmarkPubSubPrimitiveChannelsMultiple-2     30000     51255 ns/op
>>>>>>    54 B/op       2 allocs/op
>>>>>> BenchmarkPubSubWaitGroupMultiple-2             20000     60871 ns/op
>>>>>>    43 B/op       2 allocs/op
>>>>>> PASS
>>>>>> ok   sibte.so/rascore 4.022s
>>>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=1 go
>>>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out
>>>>>> -benchmem
>>>>>> BenchmarkPubSubPrimitiveChannelsMultiple   20000     79534 ns/op
>>>>>>  61 B/op       2 allocs/op
>>>>>> BenchmarkPubSubWaitGroupMultiple          100000     19066 ns/op
>>>>>>  40 B/op       2 allocs/op
>>>>>> PASS
>>>>>> ok   sibte.so/rascore 4.502s
>>>>>>
>>>>>>  I tried multiple times and results are consistent. I am using Go
>>>>>> 1.8, Linux x64, 8GB RAM. I have multiple questions:
>>>>>>
>>>>>>
>>>>>>    - Why do channels perform worst than sync.Cond in single core
>>>>>>    results? Context switching is same if anything it should perform 
>>>>>> worst.
>>>>>>    - As I increase the max procs the sync.Cond results go down which
>>>>>>    might be explainable, but what is up with channels? 20k to 30k to 20k 
>>>>>> to
>>>>>>     10k :( I have a i5 with 4 cores, so it should have peaked at 4 procs 
>>>>>> (pst.
>>>>>>    I tried 3 as well it's consistent).
>>>>>>
>>>>>>  I am still suspicious I am not making some kind of mistake in code.
>>>>>> Any ideas?
>>>>>>
>>>>>> - Thanks
>>>>>>
>>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-- 
Michael T. Jones
michael.jo...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Re: Can't explain Golang benchmarks

Reply via email to