This https://github.com/robaho/go-concurrency-test might help. It has some 
relative performance metrics of some sync primitives. In general though Go uses 
“fibers” which don’t match up well with busy loops - and the scheduling 
overhead can be minimal. 

> On Jan 13, 2021, at 6:57 PM, Peter Wilson <peter.wil...@bsc.es> wrote:
> 
> Folks
> I have code in C which implements a form of discrete event simulator, but 
> optimised for a clocked system in which most objects accept input on the 
> positive edge of a simulated clock, do their appropriate internal 
> computations, and store the result internally. On the negative edge of the 
> clock, they each take their stored internal state and 'send' it on to the 
> appropriate destination object
> 
> This is modelled using a linked list of objects, each of which has a phase0 
> and phase1 function, and traversing the list twice per clock, calling the 
> appropriate function.
> 
> It all works fine. On a uniprocessor. If we have one processor object and one 
> memory object, with the processor implementing a standard instruction fetch 
> decode implement interpreted loop, and playing with simulated caches, reading 
> or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. 
> 
> So since (much!) more performance is wanted, implementing this for a 
> multiprocessor seems a good idea. Especially since every computer and its dog 
> is multicore. Using go rather than C also sounds like a good idea.
> 
> So  the sketch of the go implementation is that I would have three threads - 
> main, t0, and t1. (more for a real system, but two suffices for explanation)
> - main sets stuff up, and t0 and t1 do the simulation work
> - main has to initialise, set up any needed synchronization mechanism, and 
> start t0 and t1
> - t0 and t1 wait until main says its ok, then both traverse all objects in 
> the list. t0 runs the function if it's an even numbered object, and t1 if 
> it's an odd-numbered. No mutation of state by concurrent threads.
> - main loops, as do t0 and t1; t0 and t1 signal that they've finished; when 
> they have, main tells them to start the next traversal
> 
> So, after a long ramble, given that I am happy to waste CPU time in busy 
> waits (rather than have the overhead of scheduling blocked goroutines), what 
> is the recommendation for the signalling mechanism when all is done in go and 
> everything's a goroutine, not a thread?
> 
> My guess is that creating specialist blocking 'barriers' using sync/atomic 
> (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest 
> performance mechanism. There's a dearth of performance information on channel 
> communication, waitgroup, mutex etc use, but those I have seen seem to 
> suggest that sending/receiving on a channel might be over the order of 
> 100nsec; since in C we iterate twice through the list in 30-40nsec, this is a 
> tad high (yes, fixeable by modeling a bigger system, but)
> 
> I know that premature optimisation is a bad thing, but I'd prefer to ask for 
> advice than try everything..
> 
> many thanks for any help
> 
> -- P
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/5a1d1ccb-26f4-4da0-94fb-679c201782dan%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/88759A24-4E67-45BA-8D06-6A779062FE99%40ix.netcom.com.

Reply via email to