Hi, Nilay I agree with you that to fetch from icache every cycle, hit latency don't have to be zero.
Here is a snap shot from the exec trace: (deleted some detail to make it more clear) Icache hit latency is 1 cycle and fetch width is 4 (Ticks) ...60..: ....fetch: Running stage. ...60..: ....fetch: Attempting to fetch from [tid:0] ...60..: ....fetch: [tid:0]: Adding instructions to queue to decode. ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab7 (0) created [sn:5050]. ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab9 (0) created [sn:5051]. ...60..: ....fetch: [tid:0]: Instruction PC 0x400abb (0) created [sn:5052]. ...60..: ....fetch: [tid:0]: Instruction PC 0x400abd (0) created [sn:5053]. ...60..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth for this cycle. ...65..: ....fetch: Running stage. ...65..: ....fetch: Attempting to fetch from [tid:0] ...65..: ....fetch: [tid:0]: Adding instructions to queue to decode. ...65..: ....fetch: [tid:0]: Issuing a pipelined I-cache access, starting at PC (0x400abf=>0x400ac7).(0=>1). ...65..: ....fetch: [tid:0] Fetching cache line 0x400ac0 for addr 0x400ac0 ...65..: ....fetch: Fetch: Doing instruction read. ...65..: ....fetch: [tid:0]: Doing Icache access. ...70..: ....fetch: [tid:0] Waking up from cache miss. ...70..: ....fetch: Running stage. ...70..: ....fetch: Attempting to fetch from [tid:0] ...70..: ....fetch: [tid:0]: Icache miss is complete. ...70..: ....fetch: [tid:0]: Adding instructions to queue to decode. ...70..: ....fetch: [tid:0]: Instruction PC 0x400abf (0) created [sn:5054]. ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac1 (0) created [sn:5055]. ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac3 (0) created [sn:5056]. ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac5 (0) created [sn:5057]. ...70..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth for this cycle. When entering cycle 65, the previous cache line has been consumed so the fetch unit launched a pipelined icache access. However, this access has latency of 1 so the fetch unit need to wait till cycle 70 to start to fetch again. This created a one cycle stall. If I understand correctly, this latency could be covered it the pipelined icache access were launched one cycle earlier (in cycle 60). Can I configure that in Gem5? I am not sure whether the Fetch flag is enough to study this phenomenon. If not, please tell me what other flags should I use! BTW, the O3CPUALL debug flag seems not working. I got error "invalid debug flag 'O3CPUALL' ". Thanks! Runjie On Mon, 22 Oct 2012, Runjie Zhang wrote: Greetings, I tried to write stressmarks in X86 assembly so that the simulated IPC or O3CPU can hit N for a N-way out-of-order core. However, no matter how I modify the assembly, the IPC could never reach 4 for a 4-way OoO core. According to the execution trace, icache stall was the trouble maker. In my case, even if the whole program fits in icache, the fetch unit still stalls for a few cycles between fetching 32 instructions over 8 cycles(I assume 32 X86 ADD instructions fill one cache line?). With Gem5 memory system (no Ruby), this latency is 2 cycles. With Ruby memory, this latency is 3 cycles. So my questions are: 1. Since Gem5 does not accept a zero hit latency, is there a way to access icache every cycle without any stall? Let's assume there are no icache misses. Why do you think only a cache which is accessed with zero hit latency can be accessed with out stalls? I would expect a design that is pipelined enough would hit its peak throughput once the pipeline is full. Over here pipeline means not only the processor pipeline but the path that connects the processor to the caches. So, if the cache provides a throughput of four instructions every cycle, its latency (whether it is one cycle, or 100 cycles) would not matter at all, once the pipeline is full. I think you should check why is the fetch unit stalling. You have not stated that. 2. The icache hit latencies for both Ruby memory and Gem5 memory cores were 2 cycles, why the Ruby case experienced an extra cycle stall? You should be able to figure out from the trace where those cycles where spent. -- Nilay
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users