Hi Nilay,

Sorry for late response, I din't check my emails since last night :).

Anyway, so the checkviolations part that we are talking about, that takes care 
of not having any CMP violation of coherence, but it does not re-execute a load 
(not at the front of the commit queue) and following younger insts upon 
receiving a snoop invalidation request, so in my understanding it does not 
enforce the strict load-load ordering of a stronger model. So i add couple of 
lines in checkSnoop: see the changes below
(1) the first if clause of checking the " // If there are no loads in the LSQ 
we don't care" condition was wrong i guess in the existing code, it actually 
was checking"If there are no loads in the LSQ we don't care" with the "if 
(load_idx == loadTail)" clause. So with an additional if clause, I make sure 
that if the snoop hits the front of the load queue, then nothing need to be 
done.
(2) further I add a clause towards the end of checkSnoop () with needSC 
condition to check, if the snoop hits a executed load that is not at the front 
of the queue, reexecutes using ReExec (hopefully ReExec squashs all the younger 
insts including that and re-fetches, as i understood from Ali's response)


The other changes that I did to maintain SC is to add few more constraints on 
the load queue to ensure store-load ordering, ie a load in the load queue can 
not retire from ROB until and unless the committed store instructions before 
that in the program order are exposed to the memory system, as a result a load 
can still receive snoop invalidates and need to be re-executed, if needed. I 
can post my changes to enforce SC for review.


template <class Impl>
void
LSQUnit<Impl>::checkSnoop(PacketPtr pkt)
{
 int load_idx = loadHead;


 if (!cacheBlockMask) {
 assert(dcachePort);
 Addr bs = dcachePort->peerBlockSize();


 // Make sure we actually got a size
 assert(bs != 0);


 cacheBlockMask = ~(bs - 1);
 }


 // If there are no loads in the LSQ we don't care
 if (load_idx == loadTail) {
 DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
 //assert(0);
 return;
 }


 // If this is the only load in the LSQ we don't care
 if (loadTail == (load_idx + 1)) {
 DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
 //assert(0);
 return;
 }
 incrLdIdx(load_idx);
DPRINTF(LSQUnit, "Got snoop for address %#x\n", pkt->getAddr());
 Addr invalidate_addr = pkt->getAddr() & cacheBlockMask;
 while (load_idx != loadTail) {
 DynInstPtr ld_inst = loadQueue[load_idx];


 if (!ld_inst->effAddrValid || ld_inst->uncacheable()) {
 incrLdIdx(load_idx);
 continue;
 }


 Addr load_addr = ld_inst->physEffAddr & cacheBlockMask;
 DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#x\n",
 ld_inst->seqNum, load_addr, invalidate_addr);


 if (load_addr == invalidate_addr) {
 if (ld_inst->possibleLoadViolation) {
 DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]\n",
 ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum);


 // Mark the load for re-execution
 ld_inst->fault = new ReExec;
 } else {
 // If a older load checks this and it's true
 // then we might have missed the snoop
 // in which case we need to invalidate to be sure
 ld_inst->hitExternalSnoop = true;


 if (needsSC == true){

 ld_inst->fault = new ReExec;
 }
 }
 }
 incrLdIdx(load_idx);
 }
 return;
}

On 07/12/12, Nilay Vaish  wrote:
> Dibakar, any progress on this front?
> 
> On Wed, 27 Jun 2012, Ali Saidi wrote:
> 
> >
> >
> >Hi Dibakar,
> >
> >I'm not saying that I believe this is correct for x86.
> >It seems like x86 does require more ordering than is currently provided
> >by the lsq. Hopefully someone with more x86 experience could chime in
> >and confirm that. The faulting mechanism needs an overhaul in the o3
> >cpu. There shouldn't be any fundamental difference.
> >
> >Thanks,
> >
> >Ali
> >
> >On
> >27.06.2012 18:08, Dibakar Gope wrote:
> >
> >>Hi Ali,
> >>
> >>from this thread,
> >http://www.mail-archive.com/gem5-dev@gem5.org/msg00782.html, I get an
> >idea that a snoop invalidate will make a younger load and its following
> >younger instructions to re-execute, if only an older load in the program
> >order to the same cache block see an updated value. But I am not still
> >sure, if it obeys the load-load ordering of a stronger consistency model
> >other than ARM. Suppose for example,
> >>C0 C1
> >>St A Ld C
> >>St B Ld A
> >>
> >
> >>In the above scenario, if the memory order becomes Ld A -> St A -> St
> >B -> Ld C and if C1 receives an invalidation for cache block A, before
> >Ld A make it to the front of the commit queue, still checkViolations()
> >code won't squash the Ld A and any younger instructions to maintain
> >strong consistency.
> >>
> >>My other doubt is that, can we make use of the
> >squashDueToMemOrder() squash mechanism instead of using ReExec fault, if
> >I want to squash the load A and younger instructions and re-fetch those
> >again in the above scenario? ReExec waits for the faulted instruction to
> >reach the front of the commit, is there any other fundamental difference
> >of using ReExec in comparison to the squashDueToMemOrder() other than
> >this?
> >>
> >>Thanks,
> >>--Dibakar
> >>
> >>On 06/25/12, Ali Saidi wrote:
> >>
> >>>
> >ARM just requires load-load ordering (which is stronger than alpha). x86
> >to my knowledge requires all stores in the system to be visible in the
> >same order. Ali On Jun 22, 2012, at 11:50 PM, Nilay wrote:
> >>>
> >>>>
> >What's the difference between ARM's load-load ordering and TSO? I am
> >guessing in ARM not all instructions are flushed from pipe, but only
> >those that are affected by the snoop. My understanding is that the O3
> >CPU flushes the entire pipeline when it sees that an instruction needs
> >to execute again. Since instructions commit inorder, any load that gets
> >squashed would mean that all subsequent loads are squashed as well. --
> >Nilay On Fri, June 22, 2012 8:47 am, Ali Saidi wrote:
> >>>>
> >>>>>HI
> >Dibakar, I'd have to think carefully about it, but you may be right
> >about TSO. I'd hope that someone who is more familiar with x86 could
> >respond. Thanks, Ali On 22.06.2012 07:46, Dibakar Gope wrote:
> >>>>>
> >
> >>>>>>Hi Ali, Thanks for the response. Ok, I got the point. I
> >>>>>
> >thought that since the O3 attempts to support the TSO for X86 , so
> >inherently this enforces/covers the regular load-load ordering present
> >in any stronger consistency model. But if it inline with ARM's
> >requirements,then does it not violate x86 and TSO's conventional
> >load-load ordering?
> >>>>>
> >>>>>>thanks, Dibakar
> >>>>
> >_______________________________________________ gem5-users mailing list
> >gem5-users@gem5.org [1]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [2]
> >>>
> >_______________________________________________ gem5-users mailing list
> >gem5-users@gem5.org [3]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [4]
> >>
> >>
> >_______________________________________________
> >>gem5-users mailing
> >list
> >>gem5-users@gem5.org
> >>
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >
> >
> >
> >
> >Links:
> >------
> >[1] mailto:gem5-users@gem5.org(javascript:main.compose()
> >[2]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >[3]
> >mailto:gem5-users@gem5.org(javascript:main.compose()
> >[4]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to