50ns is approximately the cost of a cross-CPU L2 cache miss. Any time you have tight cross-CPU communication, you're going to incur that cost no matter how the communication is performed — whether it's via a sync.Mutex, a channel, or an atomic write.
The way to eliminate that cost is to eliminate or batch up the cross-core communication, so that you incur less than one cross-core cache miss per operation (compare https://youtu.be/C1EtfDnsdDs). On Monday, January 18, 2021 at 10:11:08 PM UTC-5 Peter Wilson wrote: > Robert > > I was interested in channel peformance only because it's the 'go idiom'. > Channel communication is relatively complex, and so is a useful upper limit > to costs of inter-goroutine synchronisation. > It's reassuring to see that 50nsec per operation is what my machinery > delivers. Simpler operations with atomics will therefore be no slower, and > quite possibly quicker. This is good. And checkable by experiment > > Once the sync costs are settled, I suspect it's the load-balancing which > will rate-limit. That's more of a concern. But first things first. > > On Monday, January 18, 2021 at 8:59:15 PM UTC-6 ren...@ix.netcom.com > wrote: > >> Channels are built with locks which without biased locking can lead to >> delays as routines are scheduled / blocked under -especially under >> contention. >> >> github.com/robaho/go-concurrency-test might illuminate some other >> options to try. >> >> On Jan 18, 2021, at 8:13 PM, Pete Wilson <peter....@bsc.es> wrote: >> >> No need to beg forgiveness. >> >> >> For me, the issue is the synchronisation, not how the collection is >> specified. >> You’re right; there’s no reason why a slice (or an array) couldn’t be >> used to define the collection. In fact, that’s the plan. >> But the synchronisation is still a pain. >> >> FWIW I built a channel-based experiment. >> >> I have N worker threads plus main >> I have an N channel array toworkers; each gets a pointer to its own >> channel >> I have an N-buffered channel fromthreads >> >> workers wait for an input on their input >> when they get it, they send something on the buffered channel back to main >> ..each loops L times >> >> main sends to each worker on its own channel >> main waits for N responses on the communal buffered channel >> .. loops L times >> >> The end result is that >> - at one goroutine per core, message send or receive costs 100-200 nsec >> - but the 6-core, 12-virtual core processor only runs at 2.5 cores worth >> of load. >> - presumably, the go runtime has too much overhead timeslicing its thread >> when there’s only one goroutine per thread or core >> >> If we put 32 threads per core, so the go runtime is busily engaged in >> multiplexing goroutines onto a thread, cost per communication drops to >> around 50-60 ns >> >> For my purposes, that’s far too many threads (gororutines) but it does >> suggest that synchronisation has an upper limit cost of 50 ns. >> So tomorrow we try the custom sync approach. Results will be published. >> >> Meanwhile, thanks to all for thoughts and advice >> >> — P >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/780b3521-d0e0-4678-b9a3-703baaba6e2en%40googlegroups.com.