Greetings, I tried to write stressmarks in X86 assembly so that the simulated IPC or O3CPU can hit N for a N-way out-of-order core. However, no matter how I modify the assembly, the IPC could never reach 4 for a 4-way OoO core.
According to the execution trace, icache stall was the trouble maker. In my case, even if the whole program fits in icache, the fetch unit still stalls for a few cycles between fetching 32 instructions over 8 cycles(I assume 32 X86 ADD instructions fill one cache line?). With Gem5 memory system (no Ruby), this latency is 2 cycles. With Ruby memory, this latency is 3 cycles. So my questions are: 1. Since Gem5 does not accept a zero hit latency, is there a way to access icache every cycle without any stall? Let's assume there are no icache misses. 2. The icache hit latencies for both Ruby memory and Gem5 memory cores were 2 cycles, why the Ruby case experienced an extra cycle stall? I was running Full System Gem5(changeset: 9305:ac608464be80) with X86 ISA and single detailed CPU. For Ruby, I used MOESI_hammer protocol. Thanks! Runjie Zhang University of Virginia
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users