Re: [PATCH] c++: implement C++17 hardware interference size

Matthias Kretz Fri, 16 Jul 2021 12:37:51 -0700

On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kr...@gsi.de> wrote:
> > I don't understand how this feature would lead to false sharing. But maybe
> > I
> > misunderstand the spatial prefetcher. The first access to one of the two
> > cache
> > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > core
> > with a different L2 reads the other cache line the cache line would be
> > duplicated; if it writes to it, it would be exclusive to the other core's
> > L2.
> > The cache line pairs do not affect each other anymore. Maybe there's a
> > minor
> > inefficiency on initial transfer from memory, but isn't that all?
> 
> If two cores that do not share an L2 cache need exclusive access to
> a cache-line, the L2 spatial prefetcher could cause pingponging if those
> two cache-lines were adjacent and shared the same 128 byte alignment.
> Say core A requests line x1 in exclusive, it also get line x2 (not sure
> if x2 would be in shared or exclusive), core B then requests x2 in
> exclusive,
> it also gets x1. Irrelevant of the state x1 comes into core B's private L2
> cache
> it invalidates the exclusive state on cache-line x1 in core A's private L2
> cache. If this was done in a loop (say a simple `lock add` loop) it would
> cause
> pingponging on cache-lines x1/x2 between core A and B's private L2 caches.

Quoting the latest ORM: "The following two hardware prefetchers fetched data
from memory to the L2 cache and last level cache:
Spatial Prefetcher: This prefetcher strives to complete every cache line
fetched to the L2 cache with the pair line that completes it to a 128-byte
aligned chunk."

1. If the requested cache line is already present on some other core, the
spatial prefetcher should not get used ("fetched data from memory").

2. The section is about data prefetching. It is unclear whether the spatial
prefetcher applies at all for normal cache line fetches.

3. The ORM uses past tense ("The following two hardware prefetchers fetched
data"), which indicates to me that Intel isn't doing this for newer
generations anymore.

4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of cache
line A and the adjacent cache line B thus is also loaded to LLC. Core 2
request a read of line B and thus loads line A into LLC. Now both cores have
both cache lines in LLC. Core 1 writes to line A, which invalidates line A in
LLC of Core 2 but does not affect line B. Core 2 writes to line B,
invalidating line A for Core 1. => no false sharing. Where did I get my mental
cache protocol wrong?

--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtz Centre for Heavy Ion Research https://gsi.de
std::experimental::simd https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

Re: [PATCH] c++: implement C++17 hardware interference size

Reply via email to