Hi Nilay, Sorry for late response, I din't check my emails since last night :).
Anyway, so the checkviolations part that we are talking about, that takes care of not having any CMP violation of coherence, but it does not re-execute a load (not at the front of the commit queue) and following younger insts upon receiving a snoop invalidation request, so in my understanding it does not enforce the strict load-load ordering of a stronger model. So i add couple of lines in checkSnoop: see the changes below (1) the first if clause of checking the " // If there are no loads in the LSQ we don't care" condition was wrong i guess in the existing code, it actually was checking"If there are no loads in the LSQ we don't care" with the "if (load_idx == loadTail)" clause. So with an additional if clause, I make sure that if the snoop hits the front of the load queue, then nothing need to be done. (2) further I add a clause towards the end of checkSnoop () with needSC condition to check, if the snoop hits a executed load that is not at the front of the queue, reexecutes using ReExec (hopefully ReExec squashs all the younger insts including that and re-fetches, as i understood from Ali's response) The other changes that I did to maintain SC is to add few more constraints on the load queue to ensure store-load ordering, ie a load in the load queue can not retire from ROB until and unless the committed store instructions before that in the program order are exposed to the memory system, as a result a load can still receive snoop invalidates and need to be re-executed, if needed. I can post my changes to enforce SC for review. template <class Impl> void LSQUnit<Impl>::checkSnoop(PacketPtr pkt) { int load_idx = loadHead; if (!cacheBlockMask) { assert(dcachePort); Addr bs = dcachePort->peerBlockSize(); // Make sure we actually got a size assert(bs != 0); cacheBlockMask = ~(bs - 1); } // If there are no loads in the LSQ we don't care if (load_idx == loadTail) { DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail); //assert(0); return; } // If this is the only load in the LSQ we don't care if (loadTail == (load_idx + 1)) { DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail); //assert(0); return; } incrLdIdx(load_idx); DPRINTF(LSQUnit, "Got snoop for address %#x\n", pkt->getAddr()); Addr invalidate_addr = pkt->getAddr() & cacheBlockMask; while (load_idx != loadTail) { DynInstPtr ld_inst = loadQueue[load_idx]; if (!ld_inst->effAddrValid || ld_inst->uncacheable()) { incrLdIdx(load_idx); continue; } Addr load_addr = ld_inst->physEffAddr & cacheBlockMask; DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#x\n", ld_inst->seqNum, load_addr, invalidate_addr); if (load_addr == invalidate_addr) { if (ld_inst->possibleLoadViolation) { DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]\n", ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum); // Mark the load for re-execution ld_inst->fault = new ReExec; } else { // If a older load checks this and it's true // then we might have missed the snoop // in which case we need to invalidate to be sure ld_inst->hitExternalSnoop = true; if (needsSC == true){ ld_inst->fault = new ReExec; } } } incrLdIdx(load_idx); } return; } On 07/12/12, Nilay Vaish wrote: > Dibakar, any progress on this front? > > On Wed, 27 Jun 2012, Ali Saidi wrote: > > > > > > >Hi Dibakar, > > > >I'm not saying that I believe this is correct for x86. > >It seems like x86 does require more ordering than is currently provided > >by the lsq. Hopefully someone with more x86 experience could chime in > >and confirm that. The faulting mechanism needs an overhaul in the o3 > >cpu. There shouldn't be any fundamental difference. > > > >Thanks, > > > >Ali > > > >On > >27.06.2012 18:08, Dibakar Gope wrote: > > > >>Hi Ali, > >> > >>from this thread, > >http://www.mail-archive.com/gem5-dev@gem5.org/msg00782.html, I get an > >idea that a snoop invalidate will make a younger load and its following > >younger instructions to re-execute, if only an older load in the program > >order to the same cache block see an updated value. But I am not still > >sure, if it obeys the load-load ordering of a stronger consistency model > >other than ARM. Suppose for example, > >>C0 C1 > >>St A Ld C > >>St B Ld A > >> > > > >>In the above scenario, if the memory order becomes Ld A -> St A -> St > >B -> Ld C and if C1 receives an invalidation for cache block A, before > >Ld A make it to the front of the commit queue, still checkViolations() > >code won't squash the Ld A and any younger instructions to maintain > >strong consistency. > >> > >>My other doubt is that, can we make use of the > >squashDueToMemOrder() squash mechanism instead of using ReExec fault, if > >I want to squash the load A and younger instructions and re-fetch those > >again in the above scenario? ReExec waits for the faulted instruction to > >reach the front of the commit, is there any other fundamental difference > >of using ReExec in comparison to the squashDueToMemOrder() other than > >this? > >> > >>Thanks, > >>--Dibakar > >> > >>On 06/25/12, Ali Saidi wrote: > >> > >>> > >ARM just requires load-load ordering (which is stronger than alpha). x86 > >to my knowledge requires all stores in the system to be visible in the > >same order. Ali On Jun 22, 2012, at 11:50 PM, Nilay wrote: > >>> > >>>> > >What's the difference between ARM's load-load ordering and TSO? I am > >guessing in ARM not all instructions are flushed from pipe, but only > >those that are affected by the snoop. My understanding is that the O3 > >CPU flushes the entire pipeline when it sees that an instruction needs > >to execute again. Since instructions commit inorder, any load that gets > >squashed would mean that all subsequent loads are squashed as well. -- > >Nilay On Fri, June 22, 2012 8:47 am, Ali Saidi wrote: > >>>> > >>>>>HI > >Dibakar, I'd have to think carefully about it, but you may be right > >about TSO. I'd hope that someone who is more familiar with x86 could > >respond. Thanks, Ali On 22.06.2012 07:46, Dibakar Gope wrote: > >>>>> > > > >>>>>>Hi Ali, Thanks for the response. Ok, I got the point. I > >>>>> > >thought that since the O3 attempts to support the TSO for X86 , so > >inherently this enforces/covers the regular load-load ordering present > >in any stronger consistency model. But if it inline with ARM's > >requirements,then does it not violate x86 and TSO's conventional > >load-load ordering? > >>>>> > >>>>>>thanks, Dibakar > >>>> > >_______________________________________________ gem5-users mailing list > >gem5-users@gem5.org [1] > >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [2] > >>> > >_______________________________________________ gem5-users mailing list > >gem5-users@gem5.org [3] > >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [4] > >> > >> > >_______________________________________________ > >>gem5-users mailing > >list > >>gem5-users@gem5.org > >> > >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > > > > > > >Links: > >------ > >[1] mailto:gem5-users@gem5.org(javascript:main.compose() > >[2] > >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > >[3] > >mailto:gem5-users@gem5.org(javascript:main.compose() > >[4] > >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users