On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz <m.kr...@gsi.de> wrote:
> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote: > > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kr...@gsi.de> wrote: > > > I don't understand how this feature would lead to false sharing. But > maybe > > > I > > > misunderstand the spatial prefetcher. The first access to one of the > two > > > cache > > > lines pairs would bring both cache lines to LLC (and possibly L2). If a > > > core > > > with a different L2 reads the other cache line the cache line would be > > > duplicated; if it writes to it, it would be exclusive to the other > core's > > > L2. > > > The cache line pairs do not affect each other anymore. Maybe there's a > > > minor > > > inefficiency on initial transfer from memory, but isn't that all? > > > > If two cores that do not share an L2 cache need exclusive access to > > a cache-line, the L2 spatial prefetcher could cause pingponging if those > > two cache-lines were adjacent and shared the same 128 byte alignment. > > Say core A requests line x1 in exclusive, it also get line x2 (not sure > > if x2 would be in shared or exclusive), core B then requests x2 in > > exclusive, > > it also gets x1. Irrelevant of the state x1 comes into core B's private > L2 > > cache > > it invalidates the exclusive state on cache-line x1 in core A's private > L2 > > cache. If this was done in a loop (say a simple `lock add` loop) it would > > cause > > pingponging on cache-lines x1/x2 between core A and B's private L2 > caches. > > Quoting the latest ORM: "The following two hardware prefetchers fetched > data > from memory to the L2 cache and last level cache: > Spatial Prefetcher: This prefetcher strives to complete every cache line > fetched to the L2 cache with the pair line that completes it to a 128-byte > aligned chunk." > > 1. If the requested cache line is already present on some other core, the > spatial prefetcher should not get used ("fetched data from memory"). > I think this is correct and I'm incorrect that a request from LLC to L2 will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for the added confusion! > > 2. The section is about data prefetching. It is unclear whether the > spatial > prefetcher applies at all for normal cache line fetches. > > 3. The ORM uses past tense ("The following two hardware prefetchers > fetched > data"), which indicates to me that Intel isn't doing this for newer > generations anymore. > 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of > cache > line A and the adjacent cache line B thus is also loaded to LLC. Core 2 > request a read of line B and thus loads line A into LLC. Now both cores > have > both cache lines in LLC. Core 1 writes to line A, which invalidates line A > in > LLC of Core 2 but does not affect line B. Core 2 writes to line B, > invalidating line A for Core 1. => no false sharing. Where did I get my > mental > cache protocol wrong? > -- > ────────────────────────────────────────────────────────────────────────── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ────────────────────────────────────────────────────────────────────────── > > > >