Hi, Nilay

  I agree with you that to fetch from icache every cycle, hit latency
don't have to be zero.

  Here is a snap shot from the exec trace: (deleted some detail to
make it more clear) Icache hit latency is 1 cycle and fetch width is 4

(Ticks)
...60..: ....fetch: Running stage.
...60..: ....fetch: Attempting to fetch from [tid:0]
...60..: ....fetch: [tid:0]: Adding instructions to queue to decode.
...60..: ....fetch: [tid:0]: Instruction PC 0x400ab7 (0) created [sn:5050].
...60..: ....fetch: [tid:0]: Instruction PC 0x400ab9 (0) created [sn:5051].
...60..: ....fetch: [tid:0]: Instruction PC 0x400abb (0) created [sn:5052].
...60..: ....fetch: [tid:0]: Instruction PC 0x400abd (0) created [sn:5053].
...60..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth
for this cycle.

...65..: ....fetch: Running stage.
...65..: ....fetch: Attempting to fetch from [tid:0]
...65..: ....fetch: [tid:0]: Adding instructions to queue to decode.
...65..: ....fetch: [tid:0]: Issuing a pipelined I-cache access,
starting at PC (0x400abf=>0x400ac7).(0=>1).
...65..: ....fetch: [tid:0] Fetching cache line 0x400ac0 for addr 0x400ac0
...65..: ....fetch: Fetch: Doing instruction read.
...65..: ....fetch: [tid:0]: Doing Icache access.

...70..: ....fetch: [tid:0] Waking up from cache miss.
...70..: ....fetch: Running stage.
...70..: ....fetch: Attempting to fetch from [tid:0]
...70..: ....fetch: [tid:0]: Icache miss is complete.
...70..: ....fetch: [tid:0]: Adding instructions to queue to decode.
...70..: ....fetch: [tid:0]: Instruction PC 0x400abf (0) created [sn:5054].
...70..: ....fetch: [tid:0]: Instruction PC 0x400ac1 (0) created [sn:5055].
...70..: ....fetch: [tid:0]: Instruction PC 0x400ac3 (0) created [sn:5056].
...70..: ....fetch: [tid:0]: Instruction PC 0x400ac5 (0) created [sn:5057].
...70..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth
for this cycle.

   When entering cycle 65, the previous cache line has been consumed
so the fetch unit launched a pipelined icache access. However, this
access has latency of 1 so the fetch unit need to wait till cycle 70
to start to fetch again. This created a one cycle stall. If I
understand correctly, this latency could be covered it the pipelined
icache access were launched one cycle earlier (in cycle 60). Can I
configure that in Gem5?

  I am not sure whether the Fetch flag is enough to study this
phenomenon. If not, please tell me what other flags should I use!

 BTW, the O3CPUALL debug flag seems not working. I got error "invalid
debug flag 'O3CPUALL' ".

Thanks!
Runjie


On Mon, 22 Oct 2012, Runjie Zhang wrote:


Greetings,

 I tried to write stressmarks in X86 assembly so that the simulated IPC or
O3CPU can hit N for a N-way out-of-order core. However, no matter how I
modify the assembly, the IPC could never reach 4 for a 4-way OoO core.

 According to the execution trace, icache stall was the trouble maker. In
my case, even if the whole program fits in icache, the fetch unit still
stalls for a few cycles between fetching 32 instructions over 8 cycles(I
assume 32 X86 ADD instructions fill one cache line?). With Gem5 memory
system (no Ruby), this latency is 2 cycles. With Ruby memory, this latency
is 3 cycles.

 So my questions are:

 1. Since Gem5 does not accept a zero hit latency, is there a way to
access icache every cycle without any stall? Let's assume there are no
icache misses.

Why do you think only a cache which is accessed with zero hit latency can be
accessed with out stalls? I would expect a design that is pipelined enough
would hit its peak throughput once the pipeline is full. Over here pipeline
means not only the processor pipeline but the path that connects the
processor to the caches. So, if the cache provides a throughput of four
instructions every cycle, its latency (whether it is one cycle, or 100
cycles) would not matter at all, once the pipeline is full.

I think you should check why is the fetch unit stalling. You have not stated
that.

 2. The icache hit latencies for both Ruby memory and Gem5 memory cores
were 2 cycles, why the Ruby case experienced an extra cycle stall?

You should be able to figure out from the trace where those cycles where
spent.

--
Nilay
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to