Your benchmarks are not really *doing* something once they get data. This means that eventual conflicts on (internal or external) locks are going to be hit all the time. In turn, you are likely to measure the performance of your hardware in the event of excessive lock contention (this hypothesis can be verified: block profiling).
Were you to process something before going back to the barrier or channel, then you are less likely to hit the problem. Though you should be aware sometimes your cores will coordinate (because the work they do will tend them to come back to the lock at the same point in time - a similar problem to TCP incast). As you go from a single core to multiple cores, an important event happens in the system. In a single-core system, you don't need the hardware to coordinate. This is much faster than the case where the hardware might need to tell other physical cores about what is going on. Especially under contention. In short, there is a gap between the single-core system and a multicore system due to communication overhead. In high-performance computing a common mistake is to measure the *same* implementation in the serial and parallel case, even though the serial case can be made to avoid taking locks and so on. A fair benchmark would compare the fastest single-core implementation with the fastest multi-core ditto. The upshot of having access to multiple cores is that you can get them all to do work and this can lead to a substantial speedup. But there has to be work to do. Your benchmark has almost no work to do, so most of the cores are just competing against each other. For a PubSub implementation you need to think about the effort involved in the implementation. Up to a point, I think a channel-based solution, or a barrier based solution is adequate. For systems where you need low latency, go for a pattern like the Disruptor pattern[0], which should be doable in Go. This pattern works by a ring-buffer of events and two types of "clock hands". One hand is the point of insertion of a new event, the other hand(s) track where the readers currently are in their processing. By tracking the hands through atomics, you can get messaging overhead down by a lot. The trick is that each thread writes to its own hand and other threads can read an eventually consistent snapshot. As long as the buffer doesn't fill up completely, the eventually consistent view of an ever increasing counter is enough. Getting an older value is not a big deal, as long as the value isn't undefined. The problem with channels are that they are, inherently, one-shot. If a message is consumed, it is gone from the channel. In a PubSub solution, you really want multiple readers process the same (immutable) data and flag it when it is all consumed. Message brokers and SMTP servers often optimize delivery of this kind by keeping the payload separate from the header (envelope): better not pay for the contents more than once. But you still have to send a message to each subscriber, and this gets rather expensive. The Disruptor pattern solves this. On the other hand, the Disruptor only works if you happen to have memory shared. In large distributed systems, the pattern tend to break down, and other methods must be used. For instance a cluster of disruptors. But then keeping consistency in check becomes a challenge of the fun kind :) [0] http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf - Thompson, Farley, Barker, Gee, Steward, 2011 On Mon, Jul 17, 2017 at 7:46 AM Zohaib Sibte Hassan <zohaib.has...@gmail.com> wrote: > Awesome :) got similar results. So channels are slower for pubsub. I am > surprised so far, with the efficiency going down as the cores go up (heck > reminds me of that Node.js single core meme). Please tell me I am wrong > here, is there any other more efficient approach? > > On Sunday, July 16, 2017 at 8:06:44 PM UTC-7, peterGo wrote: >> >> Your latest benchmarks are invalid. >> >> In your benchmarks, replace b.StartTimer() with b.ResetTimer(). >> >> Simply run benchmarks. Don't run race detectors. Don't run profilers. For >> example, >> >> $ go version >> go version devel +504deee Sun Jul 16 03:57:11 2017 +0000 linux/amd64 >> $ go test -run=! -bench=. -benchmem -cpu=1,2,4,8 pubsub_test.go >> goos: linux >> goarch: amd64 >> BenchmarkPubSubPrimitiveChannelsMultiple 20000 86328 >> ns/op 61 B/op 2 allocs/op >> BenchmarkPubSubPrimitiveChannelsMultiple-2 30000 50844 >> ns/op 54 B/op 2 allocs/op >> BenchmarkPubSubPrimitiveChannelsMultiple-4 10000 112833 >> ns/op 83 B/op 2 allocs/op >> BenchmarkPubSubPrimitiveChannelsMultiple-8 10000 160011 >> ns/op 88 B/op 2 allocs/op >> BenchmarkPubSubWaitGroupMultiple 100000 21231 >> ns/op 40 B/op 2 allocs/op >> BenchmarkPubSubWaitGroupMultiple-2 10000 107165 >> ns/op 46 B/op 2 allocs/op >> BenchmarkPubSubWaitGroupMultiple-4 20000 73235 >> ns/op 43 B/op 2 allocs/op >> BenchmarkPubSubWaitGroupMultiple-8 20000 82917 >> ns/op 42 B/op 2 allocs/op >> PASS >> ok command-line-arguments 15.481s >> $ >> >> Peter >> >> On Sunday, July 16, 2017 at 9:51:38 PM UTC-4, Zohaib Sibte Hassan wrote: >>> >>> Thanks for pointing issues out I updated my code to get rid of race >>> conditions (nothing critical I was always doing reader-writer race). Anyhow >>> I updated my code on >>> https://gist.github.com/maxpert/f3c405c516ba2d4c8aa8b0695e0e054e. Still >>> doesn't explain the new results: >>> >>> $> go test -race -run=! -bench=. -benchmem -cpu=1,2,4,8 >>> -cpuprofile=cpu.out -memprofile=mem.out pubsub_test.go >>> BenchmarkPubSubPrimitiveChannelsMultiple 50 21121694 ns/op >>> 8515 B/op 39 allocs/op >>> BenchmarkPubSubPrimitiveChannelsMultiple-2 100 19302372 ns/op >>> 4277 B/op 20 allocs/op >>> BenchmarkPubSubPrimitiveChannelsMultiple-4 50 22674769 ns/op >>> 8182 B/op 35 allocs/op >>> BenchmarkPubSubPrimitiveChannelsMultiple-8 50 21201533 ns/op >>> 8469 B/op 38 allocs/op >>> BenchmarkPubSubWaitGroupMultiple 3000 501804 ns/op >>> 63 B/op 2 allocs/op >>> BenchmarkPubSubWaitGroupMultiple-2 200 15417944 ns/op >>> 407 B/op 6 allocs/op >>> BenchmarkPubSubWaitGroupMultiple-4 300 5010273 ns/op >>> 231 B/op 4 allocs/op >>> BenchmarkPubSubWaitGroupMultiple-8 200 5444634 ns/op >>> 334 B/op 5 allocs/op >>> PASS >>> ok command-line-arguments 21.775s >>> >>> So far my testing shows channels are slower for pubsub scenario. I tried >>> looking into pprof dumps of memory and CPU and it's not making sense to me. >>> What am I missing here? >>> >>> On Sunday, July 16, 2017 at 10:27:04 AM UTC-7, peterGo wrote: >>>> >>>> When you have data races the results are undefined. >>>> >>>> $ go version >>>> go version devel +dd81c37 Sat Jul 15 05:43:45 2017 +0000 linux/amd64 >>>> $ go test -race -run=! -bench=. -benchmem -cpu=1,2,4,8 pubsub_test.go >>>> ================== >>>> WARNING: DATA RACE >>>> Read at 0x00c4200140c0 by goroutine 18: >>>> command-line-arguments.BenchmarkPubSubPrimitiveChannelsMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:59 +0x51d >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> >>>> Previous write at 0x00c4200140c0 by goroutine 57: >>>> [failed to restore the stack] >>>> >>>> Goroutine 18 (running) created at: >>>> testing.(*B).run1() >>>> /home/peter/go/src/testing/benchmark.go:207 +0x8c >>>> testing.(*B).Run() >>>> /home/peter/go/src/testing/benchmark.go:513 +0x482 >>>> testing.runBenchmarks.func1() >>>> /home/peter/go/src/testing/benchmark.go:417 +0xa7 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.runBenchmarks() >>>> /home/peter/go/src/testing/benchmark.go:423 +0x86d >>>> testing.(*M).Run() >>>> /home/peter/go/src/testing/testing.go:928 +0x51e >>>> main.main() >>>> command-line-arguments/_test/_testmain.go:46 +0x1d3 >>>> >>>> Goroutine 57 (finished) created at: >>>> command-line-arguments.BenchmarkPubSubPrimitiveChannelsMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:40 +0x290 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> ================== >>>> --- FAIL: BenchmarkPubSubPrimitiveChannelsMultiple >>>> benchmark.go:147: race detected during execution of benchmark >>>> ================== >>>> WARNING: DATA RACE >>>> Read at 0x00c42000c030 by goroutine 1079: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:76 +0x9e >>>> >>>> Previous write at 0x00c42000c030 by goroutine 7: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:101 +0x475 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> >>>> Goroutine 1079 (running) created at: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> >>>> Goroutine 7 (running) created at: >>>> testing.(*B).run1() >>>> /home/peter/go/src/testing/benchmark.go:207 +0x8c >>>> testing.(*B).Run() >>>> /home/peter/go/src/testing/benchmark.go:513 +0x482 >>>> testing.runBenchmarks.func1() >>>> /home/peter/go/src/testing/benchmark.go:417 +0xa7 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.runBenchmarks() >>>> /home/peter/go/src/testing/benchmark.go:423 +0x86d >>>> testing.(*M).Run() >>>> /home/peter/go/src/testing/testing.go:928 +0x51e >>>> main.main() >>>> command-line-arguments/_test/_testmain.go:46 +0x1d3 >>>> ================== >>>> ================== >>>> WARNING: DATA RACE >>>> Write at 0x00c42000c030 by goroutine 7: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:101 +0x475 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> >>>> Previous read at 0x00c42000c030 by goroutine 1078: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:76 +0x9e >>>> >>>> Goroutine 7 (running) created at: >>>> testing.(*B).run1() >>>> /home/peter/go/src/testing/benchmark.go:207 +0x8c >>>> testing.(*B).Run() >>>> /home/peter/go/src/testing/benchmark.go:513 +0x482 >>>> testing.runBenchmarks.func1() >>>> /home/peter/go/src/testing/benchmark.go:417 +0xa7 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.runBenchmarks() >>>> /home/peter/go/src/testing/benchmark.go:423 +0x86d >>>> testing.(*M).Run() >>>> /home/peter/go/src/testing/testing.go:928 +0x51e >>>> main.main() >>>> command-line-arguments/_test/_testmain.go:46 +0x1d3 >>>> >>>> Goroutine 1078 (running) created at: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> ================== >>>> ================== >>>> WARNING: DATA RACE >>>> Read at 0x00c4200140c8 by goroutine 7: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:109 +0x51d >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> >>>> Previous write at 0x00c4200140c8 by goroutine 175: >>>> sync/atomic.AddInt64() >>>> /home/peter/go/src/runtime/race_amd64.s:276 +0xb >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple.func1() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:88 +0x19a >>>> >>>> Goroutine 7 (running) created at: >>>> testing.(*B).run1() >>>> /home/peter/go/src/testing/benchmark.go:207 +0x8c >>>> testing.(*B).Run() >>>> /home/peter/go/src/testing/benchmark.go:513 +0x482 >>>> testing.runBenchmarks.func1() >>>> /home/peter/go/src/testing/benchmark.go:417 +0xa7 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.runBenchmarks() >>>> /home/peter/go/src/testing/benchmark.go:423 +0x86d >>>> testing.(*M).Run() >>>> /home/peter/go/src/testing/testing.go:928 +0x51e >>>> main.main() >>>> command-line-arguments/_test/_testmain.go:46 +0x1d3 >>>> >>>> Goroutine 175 (finished) created at: >>>> command-line-arguments.BenchmarkPubSubWaitGroupMultiple() >>>> /home/peter/gopath/src/nuts/pubsub_test.go:93 +0x2e6 >>>> testing.(*B).runN() >>>> /home/peter/go/src/testing/benchmark.go:141 +0x12a >>>> testing.(*B).run1.func1() >>>> /home/peter/go/src/testing/benchmark.go:214 +0x6b >>>> ================== >>>> --- FAIL: BenchmarkPubSubWaitGroupMultiple >>>> benchmark.go:147: race detected during execution of benchmark >>>> FAIL >>>> exit status 1 >>>> FAIL command-line-arguments 0.726s >>>> $ >>>> >>>> Peter >>>> >>>> On Sunday, July 16, 2017 at 10:20:21 AM UTC-4, Zohaib Sibte Hassan >>>> wrote: >>>>> >>>>> I have been spending my day over implementing an efficient PubSub >>>>> system. I had implemented one before using channels, and I wanted to >>>>> benchmark that against sync.Cond. Here is the quick and dirty test that I >>>>> put together >>>>> https://gist.github.com/maxpert/f3c405c516ba2d4c8aa8b0695e0e054e. Now >>>>> my confusion starts when I change GOMAXPROCS to test how it would perform >>>>> on my age old Raspberry Pi. Here are results: >>>>> >>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=8 go >>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out >>>>> -benchmem >>>>> BenchmarkPubSubPrimitiveChannelsMultiple-8 10000 165419 ns/op >>>>> 92 B/op 2 allocs/op >>>>> BenchmarkPubSubWaitGroupMultiple-8 10000 204685 ns/op >>>>> 53 B/op 2 allocs/op >>>>> PASS >>>>> ok sibte.so/rascore 3.749s >>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=4 go >>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out >>>>> -benchmem >>>>> BenchmarkPubSubPrimitiveChannelsMultiple-4 20000 101704 ns/op >>>>> 60 B/op 2 allocs/op >>>>> BenchmarkPubSubWaitGroupMultiple-4 10000 204039 ns/op >>>>> 52 B/op 2 allocs/op >>>>> PASS >>>>> ok sibte.so/rascore 5.087s >>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=2 go >>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out >>>>> -benchmem >>>>> BenchmarkPubSubPrimitiveChannelsMultiple-2 30000 51255 ns/op >>>>> 54 B/op 2 allocs/op >>>>> BenchmarkPubSubWaitGroupMultiple-2 20000 60871 ns/op >>>>> 43 B/op 2 allocs/op >>>>> PASS >>>>> ok sibte.so/rascore 4.022s >>>>> mxp@carbon:~/repos/raspchat/src/sibte.so/rascore$ GOMAXPROCS=1 go >>>>> test -run none -bench Multiple -cpuprofile=cpu.out -memprofile=mem.out >>>>> -benchmem >>>>> BenchmarkPubSubPrimitiveChannelsMultiple 20000 79534 ns/op >>>>> 61 B/op 2 allocs/op >>>>> BenchmarkPubSubWaitGroupMultiple 100000 19066 ns/op >>>>> 40 B/op 2 allocs/op >>>>> PASS >>>>> ok sibte.so/rascore 4.502s >>>>> >>>>> I tried multiple times and results are consistent. I am using Go 1.8, >>>>> Linux x64, 8GB RAM. I have multiple questions: >>>>> >>>>> >>>>> - Why do channels perform worst than sync.Cond in single core >>>>> results? Context switching is same if anything it should perform worst. >>>>> - As I increase the max procs the sync.Cond results go down which >>>>> might be explainable, but what is up with channels? 20k to 30k to 20k >>>>> to >>>>> 10k :( I have a i5 with 4 cores, so it should have peaked at 4 procs >>>>> (pst. >>>>> I tried 3 as well it's consistent). >>>>> >>>>> I am still suspicious I am not making some kind of mistake in code. >>>>> Any ideas? >>>>> >>>>> - Thanks >>>>> >>>> -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.