Re: [gem5-users] Writeback buffer kills O3 performance, what is it meant to model?

Paul V. Gratz via gem5-users Mon, 12 May 2014 19:09:07 -0700

Hi All,
Agreed, thanks for confirming we were not missing something.  Just some
followup, my student has some data about this he'll post to here shortly
with the performance impact he sees for this issue, but it is quite large
for 2-wide OOO.   I was thinking it might be something along those lines
(or something about the bypass network width) but it seems like grabbing
them at issue time is probably too conservative (as opposed to grabbing
them at completion and stalling the functional unit if you can't get one).


I believe Karu Sankaralingham at Wisc also found this and a few other
issues, they have a related paper at WDDD this year.

We also found a problem where multiple outstanding loads to the same
address causing heavy flushing in O3 w/ ruby that has a similarly large
performance impact, we'll start another thread on that shortly.
Thanks!
Paul



On Mon, May 12, 2014 at 3:51 PM, Mitch Hayenga via gem5-users <
gem5-users@gem5.org> wrote:

> *"Realistically, to me, it seems like those buffers would be distributed
> among the function units anyway, not a global resource, so having a global
> limit doesn't make a lot of sense.  Does anyone else out there agree or
> disagree?"*
>
> I believe that's more or less correct.  With wbWidth probably meant to be
> the # of write ports on the register file and wbDepth being the pipe stages
> for a multi-cycle write back.
>
> I don't fully agree that it should be distributed at the function unit
> level, as you could imagine designs with higher issue width and functional
> units than the number of register file write ports.  Essentially allowing
> more instructions to be issued on a given cycle, as long as they did not
> all complete on the same cycle.
>
> Going back to Paul's issue (loads holding write back slots on misses).
>  The "proper" way to do it would probably be to reserve a slot assuming an
> L1 cache hit latency.  Give up the slot on a miss.  Have an early signal
> that a load-miss is coming back from the cache so that you could reserve a
> write back slot in parallel with doing all the other necessary work for a
> load (CAMing vs the store queue, etc). But this would likely be annoying to
> implement.
>
>
> *In general though, yes this seems like something not worth modeling in
> gem5 as the potential negative impacts of its current implementation
> outweigh the benefits.  And the benefits of fully modeling it are likely
> small.*
>
>
>
> On Mon, May 12, 2014 at 2:08 PM, Arthur Perais via gem5-users <
> gem5-users@gem5.org> wrote:
>
>>  Hi all,
>>
>> I have no specific knowledge on what are the buffers modeling or what
>> they should be modeling, but I too have encountered this issue some time
>> ago. Setting a high wbDepth is what I do to work around it (actually, 3 is
>> sufficient for me), because performance is indeed suffering quite a lot
>> (and even more for narrow-issue cores if wbWidth == issueWidth, I would
>> expect) in some cases.
>>
>> Le 12/05/2014 19:39, Steve Reinhardt via gem5-users a écrit :
>>
>> Hi Paul,
>>
>>  I assume you're talking about the 'wbMax' variable?  I don't recall it
>> specifically myself, but after looking at the code a bit, the best I can
>> come up with is that there's assumed to be a finite number of buffers
>> somewhere that hold results from the function units before they write back
>> to the reg file.  Realistically, to me, it seems like those buffers would
>> be distributed among the function units anyway, not a global resource, so
>> having a global limit doesn't make a lot of sense.  Does anyone else out
>> there agree or disagree?
>>
>>  It doesn't seem to relate to any structure that's directly modeled in
>> the code, i.e., I think you could rip the whole thing out (incrWb(),
>> decrWb(), wbOustanding, wbMax) without breaking anything in the model...
>> which would be a good thing if in fact everyone else is either suffering
>> unaware or just working around it by setting a large value for wbDepth.
>>
>>  That said, we've done some internal performance correlation work, and I
>> don't recall this being an issue, for whatever that's worth.  I know ARM
>> has done some correlation work too; have you run into this?
>>
>>  Steve
>>
>>
>>
>> On Fri, May 9, 2014 at 7:45 AM, Paul V. Gratz via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi All,
>>> Doing some digging on performance issues in the O3 model we and others
>>> have run into allocation of the writeback buffer having a big performance
>>> impact.  Basically, the a writeback buffer is grabbed at issue time and
>>> held through till completion.  With default assumptions about the number of
>>> available writeback buffers, (x*issue width, where x is 1 by default), the
>>> buffers often end up bottlenecking the effective issue width (particularly
>>> in the face of long latency loads grabbing up all the WB buffers).  What
>>> are these structures trying to model?  I can see limiting the number of
>>> instructions allowed to complete and writeback/bypass in a cycle but this
>>> ends up being much more conservative than that if it is the intent.  If not
>>> why does it do this?  We can easily make number of WB bufs high but want to
>>> understand what is going on here first...
>>> Thanks!
>>>  Paul
>>>
>>>  --
>>> -----------------------------------------
>>> Paul V. Gratz
>>> Assistant Professor
>>> ECE Dept, Texas A&M University
>>> Office: 333M WERC
>>> Phone: 979-488-4551
>>> http://cesg.tamu.edu/faculty/paul-gratz/
>>>
>>> _______________________________________________
>>> gem5-users mailing list
>>> gem5-users@gem5.org
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>>
>>
>> _______________________________________________
>> gem5-users mailing 
>> listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> --
>> Arthur Perais
>> INRIA Bretagne Atlantique
>> Bâtiment 12E, Bureau E303, Campus de Beaulieu
>> 35042 Rennes, France
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
-----------------------------------------
Paul V. Gratz
Assistant Professor
ECE Dept, Texas A&M University
Office: 333M WERC
Phone: 979-488-4551
http://cesg.tamu.edu/faculty/paul-gratz/

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Writeback buffer kills O3 performance, what is it meant to model?

Reply via email to