This particular problem is akin to the way GPUs work; in GPU computing, you generally marshal all your inputs into one buffer and have the outputs from a single kernel written to an output buffer (obviously there can be more on either side). The execution overhead of a single invocation is large, so batching things up to run massively in parallel in one run is essential.
On a CPU, you'll also get benefits from this sort of arrangement due to things such as data locality (for caching purposes) and reduced IPC overhead. Barriers can be reasonably simple to implement, depending on your use case (this may not describe yours); thinking about an event-driven simulator such as a VHDL/Verilog simulator, pushing your new events keyed by their scheduled time into a priority queue (generally a heap, so O(log(n)) time per insertion) and popping them (pre-execution) into your batch per time slice until you've reached the end of a given time slice (assuming events cannot push back into their own timeslice, which is usually bad behavior anyway) will let you simulate one "clock" at a time without having to worry about the actual clock, and will also let you simulate things like propagation delay without having to do weird special-casing for that sort of thing. In any case, this approach doesn't require barriers so much as just managing work batches. And, using this approach, if you ever do decide to do computation on something like OpenCL, you'll be lined up to process it that way. - Dave > On Jan 17, 2021, at 10:11 AM, Pete Wilson <peter.wil...@bsc.es> wrote: > > The problem is that N or so channel communications twice per simulated clock > seems to take much longer than the time spent doing useful work. > > go isn’t designed for this sort of work, so it’s not a complaint to note it’s > not as good as I’d like it to be. But the other problem is that processors > are still architected as if most code is sequential and synchronisations (and > communications) are extremely rare and so no problem if they’re as slow as > molasses (which atomic operations are) > > Your description is basically what I’m trying to do, except that I’m using > local storage rather than the array elements. It’s not clear that exchanging > array pointers is any quicker than having a 2-phase loop; the problem is > still the barriers. > > P > > >> On Jan 16, 2021, at 11:14 PM, Bakul Shah <ba...@iitbombay.org> wrote: >> >> You may be better off maintaining two state *arrays*: one that has the >> current values as input and one for writing the computed outputs. At >> "negtive" clock edge you swap the arrays. Since the input array is never >> modified during the half clock cycle when you compute, you can divide it up >> in N concurrent computations, provided a given output cell is written by >> just one thread. So theen the synchronization point is when all the threads >> are done with they computing. That is when you swap I/O state arrays, >> advance the clock and unblock all the threads to compute again. You can do >> this with N+1 channels. The tricky part may be in partitioning the >> computation in more or less equal N parts. >> > > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain information > which is privileged, confidential, proprietary, or exempt from disclosure > under applicable law. If you are not the intended recipient or the person > responsible for delivering the message to the intended recipient, you are > strictly prohibited from disclosing, distributing, copying, or in any way > using this message. If you have received this communication in error, please > notify the sender and destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer > > > > > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain information > which is privileged, confidential, proprietary, or exempt from disclosure > under applicable law. If you are not the intended recipient or the person > responsible for delivering the message to the intended recipient, you are > strictly prohibited from disclosing, distributing, copying, or in any way > using this message. If you have received this communication in error, please > notify the sender and destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/FE3D2098-6E4A-4A84-BF9B-2CA04B343AA6%40bsc.es. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/01D57836-B780-48F6-B1CB-128710EFBE24%40gmail.com.