This sounded really familiar, so I searched the mailing list to see if we'd already fixed it. Unfortunately all I learned was that you're not the first to notice... the first warning I found came over the holidays (Dec 2010) and never got followed up on, and the second was a more recent thread (just over a month ago) that I had starred but never got around to reading carefully:
http://www.mail-archive.com/[email protected]/msg09268.html http://www.mail-archive.com/[email protected]/msg00201.html I think there are at least three possible solutions: 1. Throw the prefetch away when the response arrives and you see that there's already a dirty block (what you suggest, and also what I suggested to zhanglunkai in December). 2. Cancel the prefetch in the MSHR when you see the writeback (the code that Reena attempted). 3. Probe the upper caches before issuing a prefetch to make sure there aren't any dirty copies above. I suspect that #2 might not work, since (if I read Reena's code correctly) it only cancels the prefetch in the narrow window between where the prefetch is first issued and when it gets on the bus; once the prefetch is out the door, you can't deallocate the MSHR or else things will fall apart once the response comes back and its MSHR pointer (in senderState) is no longer valid. You could do a variation on #2 that keeps the MSHR alive but marks it as bogus, but I think that's really just a different way of implementing #1. Anyway, my concern about both #1 and #2 is that if this dirty block is alive in the L1 and does *not* get written back before the prefetch returns, then the L2 is going to have a stale copy of the block that it thinks is valid, which seems very likely to lead to scenarios where it responds with that block to other caches, leading to general pandemonium. (There are probably mitigating circumstances in some of the more common configurations, but I'm pretty sure that if we have private L1s and L2s and a shared L3, and we prefetch into the L3 while there's a dirty copy in one of the L1s, then a miss from another L1 will get satisfied by the L3 without the dirty copy in the first L1 ever getting snooped.) One way to handle this more general problem is to probe the upper caches (using the "express snoop" magic) to avoid issuing prefetches if the block is already resident above the current cache (or optionally only if it's resident in a modified or exclusive state; not sure that matters). Basically we'd need to extend the check here that skips the prefetch if we already have the block: http://repo.gem5.org/gem5/file/7907b19fbe80/src/mem/cache/cache_impl.hh#l1430 to include something vaguely like this code that does an upward express snoop: http://repo.gem5.org/gem5/file/7907b19fbe80/src/mem/cache/cache_impl.hh#l1191 except we'd need to generate a new packet from scratch. The tricky part is how do we really tell that there's a block up there, since there's no explicit command for that... the easiest thing would be to do a ReadReq, then if we see memInhibitAsserted() we'll know there's a modified copy above us; the bad news is that that would change the state in the upper-level cache, and (like it or not) would not suppress prefetches when the upper cache has a block in the shared state. It could also be a problem if the protocol ever decides to hand off its modified copy to our probe. The more general thing to do would be to add a new command in MemCmd that allows you to test the state of upper-level caches without changing them; that seems like a pain though. Now that I've gone through thinking about it, another approach occurs to me: maybe instead of trying to avoid issuing the prefetch, we do the prefetch as normal but we force it to probe the upper caches as well. Thus the prefetch itself would get satisfied by the dirty copy in the upper-level cache. I'm not sure this matches what any real cache hierarchy would do, but it might be the easiest thing to implement. I think getTimingPacket() might be the place for that... that's where we seem to handle other wacky special cases. So the idea would be to add code to getTimingPacket() that checks to see if we're about to issue a prefetch, then does something like the upward express snoop code I pointed to above, and if we get a response from an upper-level cache, then we directly call handleResponse() with that (like we do with the failed store conditionals above) and then don't return the packet so it doesn't get sent out to the rest of the system (i.e., sendTiming() doesn't get called on it). Sorry for the somewhat long and stream-of-consciousness email... sorry for not responding to these earlier mentions of the problem either. Does someone want to give this a shot? I can do it, but that might not be the fastest approach (given how long it took me just to respond to these emails). Steve On Fri, Jul 8, 2011 at 4:21 PM, Ali Saidi <[email protected]> wrote: > No idea what happened there... my outbox still has the actual version. lets > try again: > > > I'm sure this is Steve's favorite topic, but there is definitely an issue > with the prefetcher replacing a dirty block in the cache with a clean copy > recently retrieved from memory. Here is a trace of what is going on: > > 69707534000: system.cpu.dcache: set a3: selecting blk 2028c0 for > replacement <----------- L1 begins trying to writeback a block > 2028c0 > 69707534000: system.cpu.dcache: replacement: replacing 2028c0 with f68c0: > writeback > 69707683000: system.l2-pf: queuing prefetch to 2028c0 @ 1000 > 69707683000: system.l2-pf: Found a pf candidate addr: 0x2028c0, inserting > into prefetch queue with delay 1000 time 69707690000 > 69707691007: system.l2-pf: returning 0x2028c0 > 69707691007: system.membus: recvTiming: src 6 dst -1 ReadReq 0x2028c0 > <----------- Prefetch sent out for address 0208c0 > 69707709001: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0 > BUSY > 69707714001: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0 > BUSY > 69707715000: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0 > 69707715000: system.l2: Writeback 2028c0 miss > <----------- Writeback occurs, prefetch still > pending > 69707721007: system.membus: recvTiming: src 5 dst 6 ReadResp 0x2028c0 > <----------- Block returns > 69707721007: system.l2: Handling response to 2028c0 > 69707721007: system.l2: Block for addr 2028c0 being updated in Cache > <----------- And it's updated?! why?! > 69707721007: system.l2: Block addr 2028c0 moving from state 15 to 15 > <----------- Doesn't change state > 69707787000: system.l2-pf: queuing prefetch to 2028c0 @ 1000 > 69707787000: system.l2-pf: Found a pf candidate addr: 0x2028c0, inserting > into prefetch queue with delay 1000 time 69707794000 > 69707795006: system.l2-pf: returning 0x2028c0 > 69708323000: system.cpu.dcache: ReadReq 2028c0 miss > <----------- L1 misses the block > 69708328001: system.tol2bus: recvTiming: src 2 dst -1 ReadReq 0x2028c0 BUSY > 69708329000: system.tol2bus: recvTiming: src 2 dst -1 ReadReq 0x2028c0 > <----------- Gets data from L2, but this is the wrong data > (not the dirty copy) > 69708329000: system.l2: set a3: moving blk 2028c0 to MRU > 69708329000: system.l2: ReadReq 2028c0 hit > 69708329000: system.l2-pf: hit: PC f048 blk_addr 2028c0 stride 64 (match), > conf > 69708340001: system.tol2bus: recvTiming: src 0 dst 2 ReadResp 0x2028c0 BUSY > 69708346000: system.tol2bus: recvTiming: src 0 dst 2 ReadResp 0x2028c0 > 69708346000: system.cpu.dcache: Handling response to 2028c0 > 69708346000: system.cpu.dcache: Block for addr 2028c0 being updated in > Cache > 69708346000: system.cpu.dcache: replacement: replacing 1fe8c0 with 2028c0: > writeback > 69708346000: system.cpu.dcache: Block addr 2028c0 moving from state 0 to 15 > 69708315000: system.cpu T0 : @route_net+2352 : ldr r2, [r12, r0 LSL > #2] : MemRead : D=0x0000000000000000 A=0x2028c0 > <-------------------------- Instruction gets the wrong data :( > > I don't know enough about the inner workings of this code to know how this > case is supposed to be handeled. Any suggestions? The only thing I can think > of is putting some code in handleFill() that does something like: > > bool was_dirty = false; > if (blk->isDirty()) > was_dirty = true; > > // if we got new data, copy it in > if (pkt->isRead() && !was_dirty) { > std::memcpy(blk->data, pkt->getPtr<uint8_t>(), blkSize); > } else if (was_dirty) { > DPRINTF(Cache, "Block addr %x not being replaced because previous > data was dirty!", addr); > } > > Any idea is this is going to create some new ugliness? It seems to solve > the problem immediately at hand. > > Thanks, > > Ali > > > > > On Fri, 8 Jul 2011 16:16:25 -0700, Steve Reinhardt <[email protected]> > wrote: > >> That's a pretty short trace, not much to go on ;-) >> >> On Fri, Jul 8, 2011 at 4:03 PM, Ali Saidi <[email protected]> wrote: >> >> >>> >>> I'm sure this is Steve's favorite topic, but there is definitely an >>> issue with the prefetcher replacing a dirty block in the cache with a >>> clean copy recently retrieved from memory. Here is a trace of what is >>> going on: >>> >>> 69707534000: system.cpu.dcache: set a3: selecting blk 2028c0 >>> for replacement >>> ______________________________**_________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >>> >>> ______________________________**_________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >> > > ______________________________**_________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
