I’d be tempted to just use C for this. That is, generate C code from a register level description of your N simulation cores and run that. That is more or less what “cycle based” verilog simulators (used to) do. The code gen can also split it M ways to run on M physical cores. You can also generate optimal synchronization code.
With linked lists you’re wasting half of the memory bandwidth and potentially the cache. Your # of elements are not going to change so a linked list doesn’t buy you anything. An array is ideal from a performance PoV. > On Jan 17, 2021, at 7:49 AM, Pete Wilson <peter.wil...@bsc.es> wrote: > That’s exactly the plan. > > The idea is to simulate perhaps the workload of a complete chiplet. That > might be (assuming no SIMD in the processors to keep the example light) 2K > cores. Each worklet is perhaps 25-50 nsec (worklet = work done for one core) > for each simulated core > > The simplest mechanism is probably that on finishing work, every worker sends > a message to main; when it’s got all the messages, it sends a message to each > of the workers. Nice and simple. But it seems as though a channel > communication is of the order of 100ns, so I’m eating 200nsec per phase > change in each worker > > With 10 executing cores and 2K simulated cores, we get to do around 200 > worklets per phase per executing core. At 25ns per worklet that’s 5 > microseconds of work per worker, and losing 200nsec out of that will still > let the thing scale reasonably to some useful number of cores. > > But tools are more useful if they’re relative broad-spectrum. If I want to > have 20 worklets per core per phase (running a simulation on a subset of the > system, to gain simulation speed), I now am using ~ 200ns out of 500 ns of > work, which is not a hugely scalable number at all. Probably it’d run slower > than a standard single core sequential implementation regardless of the > number of cores, so not a Big Win > > Were the runtime.Unique() function to exist (a hypothetical scheduler call > that, for some number of goroutines and cores, allows a goroutine to declare > that it should be the sole workload for a core; limited to a fairly large > subset of available cores) I could spinloop on an atomic load, emulating > waitgroup/barrier behaviour without any scheduler involvement and with times > closer to the 10ns level (when worklet path lengths were well-balanced) > > I’d also welcome the news that under useful circumstances channel > communication is only (say) 20ns. That’d simplify things beauteously. (All ns > measured on a ~3GHz Core i7 of 2018) > > [Historical Note: When I were a young lad, I wrote quite a bit of stuff in > occam, so channel-y stuff is lapsed second nature - channels just work - and > all this building barrier stuff is terribly unnatural. So my instincts are > (were?) good, but platform performance doesn’t seem to want to play along] > >> On Jan 17, 2021, at 9:21 AM, Robert Engels <reng...@ix.netcom.com> wrote: >> >> If there is very little work to be done - then you have N “threads” do M >> partitioned work. If M is 10x N you’ve decimated the synchronization cost. > > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain information > which is privileged, confidential, proprietary, or exempt from disclosure > under applicable law. If you are not the intended recipient or the person > responsible for delivering the message to the intended recipient, you are > strictly prohibited from disclosing, distributing, copying, or in any way > using this message. If you have received this communication in error, please > notify the sender and destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer > > > > > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain information > which is privileged, confidential, proprietary, or exempt from disclosure > under applicable law. If you are not the intended recipient or the person > responsible for delivering the message to the intended recipient, you are > strictly prohibited from disclosing, distributing, copying, or in any way > using this message. If you have received this communication in error, please > notify the sender and destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/BE0A5A72-2FF3-4FD5-8EC1-F2EB53532428%40iitbombay.org.