This https://github.com/robaho/go-concurrency-test might help. It has some relative performance metrics of some sync primitives. In general though Go uses “fibers” which don’t match up well with busy loops - and the scheduling overhead can be minimal.
> On Jan 13, 2021, at 6:57 PM, Peter Wilson <peter.wil...@bsc.es> wrote: > > Folks > I have code in C which implements a form of discrete event simulator, but > optimised for a clocked system in which most objects accept input on the > positive edge of a simulated clock, do their appropriate internal > computations, and store the result internally. On the negative edge of the > clock, they each take their stored internal state and 'send' it on to the > appropriate destination object > > This is modelled using a linked list of objects, each of which has a phase0 > and phase1 function, and traversing the list twice per clock, calling the > appropriate function. > > It all works fine. On a uniprocessor. If we have one processor object and one > memory object, with the processor implementing a standard instruction fetch > decode implement interpreted loop, and playing with simulated caches, reading > or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. > > So since (much!) more performance is wanted, implementing this for a > multiprocessor seems a good idea. Especially since every computer and its dog > is multicore. Using go rather than C also sounds like a good idea. > > So the sketch of the go implementation is that I would have three threads - > main, t0, and t1. (more for a real system, but two suffices for explanation) > - main sets stuff up, and t0 and t1 do the simulation work > - main has to initialise, set up any needed synchronization mechanism, and > start t0 and t1 > - t0 and t1 wait until main says its ok, then both traverse all objects in > the list. t0 runs the function if it's an even numbered object, and t1 if > it's an odd-numbered. No mutation of state by concurrent threads. > - main loops, as do t0 and t1; t0 and t1 signal that they've finished; when > they have, main tells them to start the next traversal > > So, after a long ramble, given that I am happy to waste CPU time in busy > waits (rather than have the overhead of scheduling blocked goroutines), what > is the recommendation for the signalling mechanism when all is done in go and > everything's a goroutine, not a thread? > > My guess is that creating specialist blocking 'barriers' using sync/atomic > (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest > performance mechanism. There's a dearth of performance information on channel > communication, waitgroup, mutex etc use, but those I have seen seem to > suggest that sending/receiving on a channel might be over the order of > 100nsec; since in C we iterate twice through the list in 30-40nsec, this is a > tad high (yes, fixeable by modeling a bigger system, but) > > I know that premature optimisation is a bad thing, but I'd prefer to ask for > advice than try everything.. > > many thanks for any help > > -- P > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/5a1d1ccb-26f4-4da0-94fb-679c201782dan%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/88759A24-4E67-45BA-8D06-6A779062FE99%40ix.netcom.com.