On Mon, Dec 30, 2024 at 10:15:13AM -0800, Paul Lalonde wrote:
> The hard part is memory consistency.
> 
> x86 has a strong coherence model, which means that any write is immediately
> visible to any other core that loads the same address.  This wreaks havoc
> on your cache architecture.  You need to either have a global
> synchronization point (effectively a global shared cache) with its
> associated performance bottleneck, or you need some method to track which
> cores are using which cache lines that might become invalidated by a
> write.  Contended cache lines become very expensive in terms of
> communication.  Intel's chips deal with this by having "tag directories"
> which track globally which caches contain which cache lines.  AMD has a
> similar structure on the IO die of their chiplet-based architectures.  In
> both cases, you get much better performance if you avoid *ever* using the
> same cache line on separate processors.  In the AMD case you can fairly
> easily partition your workload across physical memory channels to do this.
> On Intel it's much more "magical", and it's not at all clear how well the
> magic scales into very large numbers of cores.  You might note that Intel
> is not doing as well as AMD these days on the architecture side.
> 
> Relaxing the memory model somewhat like ARM and RISC-V can help there by
> reducing the number of (false) synchronization points between caches, but
> at the cost of more subtle bugs if you miss including the required
> synchronization primitives.
> 

Perhaps the target market is not really HPC in the sense of trying to
speed up _one_ task by doing it, parallely, concurrently, on a lot of
cores, but more the "cloud": limited needs for global synchronization,
since, all in all, there are separate VMs running at the same time
but almost orthogonal---even the storage is in fact segregated. So
it is a more tightly coupled cluster of separated systems, allowing
speedy migration of tasks, than a single system (for a single
task---computing weather evolution or simulating complex physics
systems. Just a guess.

> 
> 
> On Mon, Dec 30, 2024 at 10:00?AM Ron Minnich <rminn...@p9f.org> wrote:
> 
> > On Mon, Dec 30, 2024 at 9:39?AM Bakul Shah via 9fans <9fans@9fans.net>
> > wrote:
> > >
> > > I wonder how these many-core systems share memory effectively.
> > 
> > Typically there is an on-chip network, and at least on some systems,
> > memory blocks scattered among the cores.
> > 
> > See the Esperanto SOC-1 for one example.

-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Ma32369581e699aa37841140f
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to