Greetings,

  I tried to write stressmarks in X86 assembly so that the simulated IPC or
O3CPU can hit N for a N-way out-of-order core. However, no matter how I
modify the assembly, the IPC could never reach 4 for a 4-way OoO core.

  According to the execution trace, icache stall was the trouble maker. In
my case, even if the whole program fits in icache, the fetch unit still
stalls for a few cycles between fetching 32 instructions over 8 cycles(I
assume 32 X86 ADD instructions fill one cache line?). With Gem5 memory
system (no Ruby), this latency is 2 cycles. With Ruby memory, this latency
is 3 cycles.

  So my questions are:

  1. Since Gem5 does not accept a zero hit latency, is there a way to
access icache every cycle without any stall? Let's assume there are no
icache misses.

  2. The icache hit latencies for both Ruby memory and Gem5 memory cores
were 2 cycles, why the Ruby case experienced an extra cycle stall?

  I was running Full System Gem5(changeset:   9305:ac608464be80) with X86
ISA and single detailed CPU. For Ruby, I used MOESI_hammer protocol.


Thanks!

Runjie Zhang
University of Virginia
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to