Amin/Tony, there is a very big reason for why gem5 does this.  It's about
modeling what real processors do.

Modern out of orders are very deeply pipelined and instructions take
multiple cycles to execute from the time they are scheduled.  To enable
back-to-back execution of dependent instructions, the scheduler
speculatively schedules instructions multiple cycles ahead of time based on
the assumed execution cycle of the producer instructions.  If all
instructions take the expected latencies, dependent instructions catch
their operands via the bypass paths.  Loads throw a wrench into this
because they can have variable latencies (miss in the L1 cache, be blocked,
etc).  In a real pipeline, if a load misses or is blocked the speculative
schedule of instructions gets messed up.  The hard part is that some
portion of the scheduled instructions may have been independent of the
load, whereas other instructions fall in the load-dependent program slice.
 The amount of control logic to precisely determine mischeduled dependent
instructions is prohibitive, so squashing/replaying instructions is a
common technique.  There are various levels of preciseness that can be done
for replaying instructions.  If I remember right o3's was very
conservative/pessimistic.

Here is a good paper that discusses replay schemes and their impact.

Understanding Scheduling Replay Schemes
Ilhyun Kim and Mikko H. Lipasti
http://pharm.ece.wisc.edu/papers/hpca2004ikim.pdf

Whoever coded the o3 model decided that since stores do not produce
register operands, the replay is unnecessary there.



On Thu, Sep 26, 2013 at 6:35 PM, Amin Farmahini <amin...@gmail.com> wrote:

> Tony,
>
> I noticed the same thing as well and as you mentioned the perf penalty
> could be really high.
> http://www.mail-archive.com/gem5-users@gem5.org/msg05894.html
> I don't know what the reason could be, but I was able to fix this. If I
> remember right, to prevent squashing, you need to mark those loads with a
> flag or something and try to add them to instruction queue again.
>
> Thanks,
> Amin
>
>
> On Thu, Sep 26, 2013 at 6:21 PM, Tony Nowatzki <t...@cs.wisc.edu> wrote:
>
>> Hi All,
>>
>> Apologies in advance if this is a silly question, or a repeat.
>>
>> I recently noticed that the OoO core squashes itself and all younger
>> instructions when a load is issued to the memory system, but the cache is
>> blocked (say the MSHRs are full, or there are no targets left). Contrarily,
>> when a write is issued to the memory system, the store will simply retry
>> until the cache can handle the request.
>>
>> There is potentially some performance penalty in squashing these loads,
>> and a large energy penalty as well (can be up to 2x for the core in some
>> contrived cases, according to mcpat).  Given that these squashes can occur
>> frequently in memory-bound programs, is there a reason this was chosen as
>> the implementation?  Is there a reason why loads can't be stalled and
>> retried on a cache block?
>>
>> Thanks!
>> Tony
>> ______________________________**_________________
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>
>>
>
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to