Re: [go-nuts] Advice, please

Pete Wilson Sun, 17 Jan 2021 07:50:02 -0800

That’s exactly the plan.

The idea is to simulate  perhaps the workload of a complete chiplet. That might 
be (assuming no SIMD in the processors to keep the example light) 2K cores. 
Each worklet is perhaps 25-50 nsec (worklet = work done for one core) for each 
simulated core

The simplest mechanism is probably that on finishing work, every worker sends a
message to main; when it’s got all the messages, it sends a message to each of
the workers. Nice and simple. But it seems as though a channel communication is
of the order of 100ns, so I’m eating 200nsec per phase change in each worker

With 10 executing cores and 2K simulated cores, we get to do around 200
worklets per phase per executing core. At 25ns per worklet that’s 5
microseconds of work per worker, and losing 200nsec out of that will still let
the thing scale reasonably to some useful number of cores.

But tools are more useful if they’re relative broad-spectrum. If I want to have
20 worklets per core per phase (running a simulation on a subset of the system,
to gain simulation speed), I now am using ~ 200ns out of 500 ns of work, which
is not a hugely scalable number at all. Probably it’d run slower than a
standard single core sequential implementation regardless of the number of
cores, so not a Big Win

Were the runtime.Unique() function to exist (a hypothetical scheduler call
that, for some number of goroutines and cores, allows a goroutine to declare
that it should be the sole workload for a core; limited to a fairly large
subset of available cores) I could spinloop on an atomic load, emulating
waitgroup/barrier behaviour without any scheduler involvement and with times
closer to the 10ns level (when worklet path lengths were well-balanced)

I’d also welcome the news that under useful circumstances channel communication
is only (say) 20ns. That’d simplify things beauteously. (All ns measured on a
~3GHz Core i7 of 2018)

[Historical Note: When I were a young lad, I wrote quite a bit of stuff in
occam, so channel-y stuff is lapsed second nature - channels just work - and
all this building barrier stuff is terribly unnatural. So my instincts are
(were?) good, but platform performance doesn’t seem to want to play along]

> On Jan 17, 2021, at 9:21 AM, Robert Engels <reng...@ix.netcom.com> wrote:
>
> If there is very little work to be done - then you have N “threads” do M
> partitioned work. If M is 10x N you’ve decimated the synchronization cost.

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain information which
is privileged, confidential, proprietary, or exempt from disclosure under
applicable law. If you are not the intended recipient or the person responsible
for delivering the message to the intended recipient, you are strictly
prohibited from disclosing, distributing, copying, or in any way using this
message. If you have received this communication in error, please notify the
sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

http://bsc.es/disclaimer

--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/C323FCEF-74EB-4DB5-B0BA-6F523E49FF83%40bsc.es.

Re: [go-nuts] Advice, please

Reply via email to