This sounded really familiar, so I searched the mailing list to see if we'd
already fixed it.  Unfortunately all I learned was that you're not the first
to notice... the first warning I found came over the holidays (Dec 2010) and
never got followed up on, and the second was a more recent thread (just over
a month ago) that I had starred but never got around to reading carefully:

http://www.mail-archive.com/[email protected]/msg09268.html
http://www.mail-archive.com/[email protected]/msg00201.html

I think there are at least three possible solutions:

1. Throw the prefetch away when the response arrives and you see that
there's already a dirty block (what you suggest, and also what I suggested
to zhanglunkai in December).
2. Cancel the prefetch in the MSHR when you see the writeback (the code that
Reena attempted).
3. Probe the upper caches before issuing a prefetch to make sure there
aren't any dirty copies above.

I suspect that #2 might not work, since (if I read Reena's code correctly)
it only cancels the prefetch in the narrow window between where the prefetch
is first issued and when it gets on the bus; once the prefetch is out the
door, you can't deallocate the MSHR or else things will fall apart once the
response comes back and its MSHR pointer (in senderState) is no longer
valid.

You could do a variation on #2 that keeps the MSHR alive but marks it as
bogus, but I think that's really just a different way of implementing #1.

Anyway, my concern about both #1 and #2 is that if this dirty block is alive
in the L1 and does *not* get written back before the prefetch returns, then
the L2 is going to have a stale copy of the block that it thinks is valid,
which seems very likely to lead to scenarios where it responds with that
block to other caches, leading to general pandemonium.  (There are probably
mitigating circumstances in some of the more common configurations, but I'm
pretty sure that if we have private L1s and L2s and a shared L3, and we
prefetch into the L3 while there's a dirty copy in one of the L1s, then a
miss from another L1 will get satisfied by the L3 without the dirty copy in
the first L1 ever getting snooped.)

One way to handle this more general problem is to probe the upper caches
(using the "express snoop" magic) to avoid issuing prefetches if the block
is already resident above the current cache (or optionally only if it's
resident in a modified or exclusive state; not sure that matters).

Basically we'd need to extend the check here that skips the prefetch if we
already have the block:
http://repo.gem5.org/gem5/file/7907b19fbe80/src/mem/cache/cache_impl.hh#l1430
to include something vaguely like this code that does an upward express
snoop:
http://repo.gem5.org/gem5/file/7907b19fbe80/src/mem/cache/cache_impl.hh#l1191
except we'd need to generate a new packet from scratch.  The tricky part is
how do we really tell that there's a block up there, since there's no
explicit command for that... the easiest thing would be to do a ReadReq,
then if we see memInhibitAsserted() we'll know there's a modified copy above
us; the bad news is that that would change the state in the upper-level
cache, and (like it or not) would not suppress prefetches when the upper
cache has a block in the shared state.  It could also be a problem if the
protocol ever decides to hand off its modified copy to our probe.  The more
general thing to do would be to add a new command in MemCmd that allows you
to test the state of upper-level caches without changing them; that seems
like a pain though.

Now that I've gone through thinking about it, another approach occurs to me:
maybe instead of trying to avoid issuing the prefetch, we do the prefetch as
normal but we force it to probe the upper caches as well.  Thus the prefetch
itself would get satisfied by the dirty copy in the upper-level cache.  I'm
not sure this matches what any real cache hierarchy would do, but it might
be the easiest thing to implement.  I think getTimingPacket() might be the
place for that... that's where we seem to handle other wacky special cases.
So the idea would be to add code to getTimingPacket() that checks to see if
we're about to issue a prefetch, then does something like the upward express
snoop code I pointed to above, and if we get a response from an upper-level
cache, then we directly call handleResponse() with that (like we do with the
failed store conditionals above) and then don't return the packet so it
doesn't get sent out to the rest of the system (i.e., sendTiming() doesn't
get called on it).

Sorry for the somewhat long and stream-of-consciousness email... sorry for
not responding to these earlier mentions of the problem either.  Does
someone want to give this a shot?  I can do it, but that might not be the
fastest approach (given how long it took me just to respond to these
emails).

Steve

On Fri, Jul 8, 2011 at 4:21 PM, Ali Saidi <[email protected]> wrote:

> No idea what happened there... my outbox still has the actual version. lets
> try again:
>
>
> I'm sure this is Steve's favorite topic, but there is definitely an issue
> with the prefetcher replacing a dirty block in the cache with a clean copy
> recently retrieved from memory. Here is a trace of what is going on:
>
> 69707534000: system.cpu.dcache: set a3: selecting blk 2028c0 for
> replacement            <----------- L1 begins trying to writeback a block
> 2028c0
> 69707534000: system.cpu.dcache: replacement: replacing 2028c0 with f68c0:
> writeback
> 69707683000: system.l2-pf:   queuing prefetch to 2028c0 @ 1000
> 69707683000: system.l2-pf: Found a pf candidate addr: 0x2028c0, inserting
> into prefetch queue with delay 1000 time 69707690000
> 69707691007: system.l2-pf: returning 0x2028c0
> 69707691007: system.membus: recvTiming: src 6 dst -1 ReadReq 0x2028c0
>           <----------- Prefetch sent out for address 0208c0
> 69707709001: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0
> BUSY
> 69707714001: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0
> BUSY
> 69707715000: system.tol2bus: recvTiming: src 2 dst -1 Writeback 0x2028c0
> 69707715000: system.l2: Writeback 2028c0 miss
>                             <----------- Writeback occurs, prefetch still
> pending
> 69707721007: system.membus: recvTiming: src 5 dst 6 ReadResp 0x2028c0
>           <----------- Block returns
> 69707721007: system.l2: Handling response to 2028c0
> 69707721007: system.l2: Block for addr 2028c0 being updated in Cache
>                  <----------- And it's updated?! why?!
> 69707721007: system.l2: Block addr 2028c0 moving from state 15 to 15
>                  <----------- Doesn't change state
> 69707787000: system.l2-pf:   queuing prefetch to 2028c0 @ 1000
> 69707787000: system.l2-pf: Found a pf candidate addr: 0x2028c0, inserting
> into prefetch queue with delay 1000 time 69707794000
> 69707795006: system.l2-pf: returning 0x2028c0
> 69708323000: system.cpu.dcache: ReadReq 2028c0 miss
>                       <----------- L1 misses the block
> 69708328001: system.tol2bus: recvTiming: src 2 dst -1 ReadReq 0x2028c0 BUSY
> 69708329000: system.tol2bus: recvTiming: src 2 dst -1 ReadReq 0x2028c0
>                <----------- Gets data from L2, but this is the wrong data
> (not the dirty copy)
> 69708329000: system.l2: set a3: moving blk 2028c0 to MRU
> 69708329000: system.l2: ReadReq 2028c0 hit
> 69708329000: system.l2-pf: hit: PC f048 blk_addr 2028c0 stride 64 (match),
> conf
> 69708340001: system.tol2bus: recvTiming: src 0 dst 2 ReadResp 0x2028c0 BUSY
> 69708346000: system.tol2bus: recvTiming: src 0 dst 2 ReadResp 0x2028c0
> 69708346000: system.cpu.dcache: Handling response to 2028c0
> 69708346000: system.cpu.dcache: Block for addr 2028c0 being updated in
> Cache
> 69708346000: system.cpu.dcache: replacement: replacing 1fe8c0 with 2028c0:
> writeback
> 69708346000: system.cpu.dcache: Block addr 2028c0 moving from state 0 to 15
> 69708315000: system.cpu T0 : @route_net+2352    :   ldr   r2, [r12, r0 LSL
> #2] : MemRead :  D=0x0000000000000000 A=0x2028c0
>  <-------------------------- Instruction gets the wrong data :(
>
> I don't know enough about the inner workings of this code to know how this
> case is supposed to be handeled. Any suggestions? The only thing I can think
> of is putting some code in handleFill() that does something like:
>
>    bool was_dirty = false;
>    if (blk->isDirty())
>        was_dirty = true;
>
>    // if we got new data, copy it in
>    if (pkt->isRead() && !was_dirty) {
>        std::memcpy(blk->data, pkt->getPtr<uint8_t>(), blkSize);
>    } else if (was_dirty) {
>        DPRINTF(Cache, "Block addr %x not being replaced because previous
> data was dirty!", addr);
>    }
>
> Any idea is this is going to create some new ugliness? It seems to solve
> the problem immediately at hand.
>
> Thanks,
>
> Ali
>
>
>
>
> On Fri, 8 Jul 2011 16:16:25 -0700, Steve Reinhardt <[email protected]>
> wrote:
>
>> That's a pretty short trace, not much to go on ;-)
>>
>> On Fri, Jul 8, 2011 at 4:03 PM, Ali Saidi <[email protected]> wrote:
>>
>>
>>>
>>> I'm sure this is Steve's favorite topic, but there is definitely an
>>> issue with the prefetcher replacing a dirty block in the cache with a
>>> clean copy recently retrieved from memory. Here is a trace of what is
>>> going on:
>>>
>>> 69707534000: system.cpu.dcache: set a3: selecting blk 2028c0
>>> for replacement
>>> ______________________________**_________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>
>>>  ______________________________**_________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>
>
> ______________________________**_________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to