We have an internal patch that generates an exclusive prefetch when a store is issued, which greatly relieves the store bottleneck. We were in the process of getting it cleaned up to post but things got bogged down somewhere. I'm going to go see what happened to it and if we can revive it.
Steve On Mon, May 5, 2014 at 9:05 AM, Mitch Hayenga via gem5-users < gem5-users@gem5.org> wrote: > Yep, the single-store in flight is a significant limitation of TSO. There > are things you can do to alleviate it (which gem5 doesn't do). > > A cpu could speculatively try to obtain ownership for a cacheline before a > store were fully committed. Thus the store could be retired much more > quickly to the L1 cache once it was committed. This is because the L1 > would likely retain ownership. > > Additionally, real CPUs split stores in to "store address" and "store > data" micro-ops. A real CPU could speculatively try to obtain cache line > ownership as soon as the store address were known (before the data). I > don't know about the x86 implementation in gem5, but I'd guess it doesn't > do this decomposition of stores. > > I'm sure if anyone tried to hack this "speculative ownership" into gem5, > many would be appreciative. > > > > On Mon, May 5, 2014 at 10:47 AM, Srinivasan Narayanamoorthy via gem5-users > <gem5-users@gem5.org> wrote: > >> Hi, >> >> That can happen.. But why is the behavior you describe not acceptable? if >> another structure is added, then incoming snoops have to CAM into that >> structure too and hardware implementation wise, may be not efficient. >> >> >> Thanks >> Srini >> >> On 05/05/14, Adrián Colaso Diego via gem5-users wrote: >> > Hi, >> > >> > I've noticed than when you run gem5 using X86 iSA there is a huge >> > bottleneck in SQ due to TSO implementation as only one store is allowed >> > to be in flight. As a consequence old stores that are waiting to access >> > memory and that aren't present in ROB saturate SQ structure. >> > >> > I think that these old stores should be inserted in another structure >> > not to saturate SQ as what i see in results files is that cpu is stalled >> > half of simulation cause SQ is full. >> > >> > Has anybody experimented what i describe? >> > >> > Thanks, >> > Adrian. >> > >> > >> > _______________________________________________ >> > gem5-users mailing list >> > gem5-users@gem5.org >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> _______________________________________________ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users