This is a pretty interesting issue. I'm not sure how it would be handled in practice. Since the loads and stores in question are not to the same address, in theory at least store set predictor should not be involved. My guess is that the most straightforward fix would be to record the actual range of the LL in the request structure and only clear the lock flag on a store if the store truly overlaps (not just if it's to the same block).
Steve On Wed, Sep 26, 2012 at 12:50 PM, Mitch Hayenga < mitch.hayenga+g...@gmail.com> wrote: > Thanks for the reply. > > Thinking about this... I don't know too much about the O3 store-set > predictor, but it would seem that load-linked instructions should care > about the entire cache line, not just if the store happens to overlap. > Since, it looks like the pending stores write to the address range > [0xf9c2c-0xf9c33], but the load-linked is to [0xf9c28-0xf9c2b] > (non-overlapping, same cache line). So the load issues early, but the > stores come in and clear the lock from the cacheline. So, either non-LLSC > stores (from the same core) shouldn't clear the locks to a cacheline > (src/cache/blk.hh:279). Or the store-set predictor should hold the > linked-load until the stores (to the same cacheline, but not overlapping) > have written back. Dibakar, another grad student here, says this impacts > Ruby as well. > > On Wed, Sep 26, 2012 at 1:27 PM, Ali Saidi <sa...@umich.edu> wrote: > >> ** >> >> Hi Mitch, >> >> >> >> I wonder if this happens in the steady state? With the implementation the >> store-set predictor should predict that the store is going to conflict the >> load and order them. Perhaps that isn't getting trained correctly with LLSC >> ops. You really don't want to mark the ops as serializing as that slows >> down the cpu quite a bit. >> >> >> >> Thanks, >> >> Ali >> >> >> >> On 26.09.2012 13:14, Mitch Hayenga wrote: >> >> Background: >> I have a non-o3, out of order CPU implemented on gem5. Since I don't >> have a checker implemented yet, I tend to diff committed instructions vs >> o3. Yesterday's patches caused a few of these diffs change because of >> load-linked/store-conditional behavior (better prediction on data ops that >> write the PC leads to denser load/store scheduling). >> Issue: >> It seems O3's own loads/stores can cause its >> load-linked/store-conditional pair to fail. Previously running a single >> core under SE, every load-linked/store-conditional pair would succeed. Now >> I'm observing them failing 21% of the time (on single-threaded programs). >> Although the programs functionally work given how the LL/SC is coded >> currently, I think this points to the fact LL/SC should be handled slightly >> differently. >> Example: >> From "Hello World" on ARM+O3+Single Core+SE+Classic Memory that shows >> this. This contains locks because I assume the C++ library is thread-safe. >> http://pastebin.com/sNjTPBWY >> The O3 CPU is effectively doing a "Test and TestAndSet". It looks like >> the load for the Test and the load-linked for the race for memory. Also, >> the CPU has a pending writeback to the same line. So effectively, the >> TestAndSet fails (haven't dug into it to determine if it was the racing >> load or the writeback that caused the failure). >> Given this, shouldn't load-linked (in this case ldrex) instructions be >> marked as non-speculative (or one of the other flags) so that they don't >> contend with earlier operations? >> Thanks. >> >> >> >> >> _______________________________________________ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users