Re: [gem5-users] LSQ bottleneck when using X86 TSO

2015-11-03 Thread Steve Reinhardt
Hi Virendra, The big problem with that patch is that, as Andreas properly noted in his review, the prefetch packet does not get deallocated properly, creating a big memory leak. We are planning to fix this so that the patch can be committed. I'm not sure how closely that relates to your proposal

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2015-11-02 Thread Virendra Kumar Pathak
Hi Steve Reinhardt, I am working on extending store functional unit to "store-address" + "store-data" in gem5 for aarch64 arm processor. Looks like you had proposed a patch with similar aim (issue excl. prefetch as soon as store address is available) Mailing archive link - https://www.mail-archi

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2014-05-13 Thread Steve Reinhardt via gem5-users
I just posted the patch for review; see http://reviews.gem5.org/r/2277. It may depend on some of the other patches I posted immediately prior to it, particularly 2276. Steve On Mon, May 5, 2014 at 9:32 AM, Adrián Colaso Diego via gem5-users < gem5-users@gem5.org> wrote: > It sounds good, i wil

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2014-05-05 Thread Adrián Colaso Diego via gem5-users
It sounds good, i will wait for that patch. Thank you. Adrian El lun, 05-05-2014 a las 09:22 -0700, Steve Reinhardt via gem5-users escribió: > We have an internal patch that generates an exclusive prefetch when a > store is issued, which greatly relieves the store bottleneck. We were > in the pr

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2014-05-05 Thread Steve Reinhardt via gem5-users
We have an internal patch that generates an exclusive prefetch when a store is issued, which greatly relieves the store bottleneck. We were in the process of getting it cleaned up to post but things got bogged down somewhere. I'm going to go see what happened to it and if we can revive it. Steve

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2014-05-05 Thread Mitch Hayenga via gem5-users
Yep, the single-store in flight is a significant limitation of TSO. There are things you can do to alleviate it (which gem5 doesn't do). A cpu could speculatively try to obtain ownership for a cacheline before a store were fully committed. Thus the store could be retired much more quickly to the

Re: [gem5-users] LSQ bottleneck when using X86 TSO

2014-05-05 Thread Srinivasan Narayanamoorthy via gem5-users
Hi, That can happen.. But why is the behavior you describe not acceptable? if another structure is added, then incoming snoops have to CAM into that structure too and hardware implementation wise, may be not efficient. Thanks Srini On 05/05/14, Adrián Colaso Diego via gem5-users wrote: > Hi

[gem5-users] LSQ bottleneck when using X86 TSO

2014-05-05 Thread Adrián Colaso Diego via gem5-users
Hi, I've noticed than when you run gem5 using X86 iSA there is a huge bottleneck in SQ due to TSO implementation as only one store is allowed to be in flight. As a consequence old stores that are waiting to access memory and that aren't present in ROB saturate SQ structure. I think that these old