Sorry for the confusion. The numbers 60, 65 and 70 were part of the tick number each cycle started. I removed some digits in the tick count to make each line shorter...
The complete trace looks like this: 33922322296000: system.switch_cpus.fetch: Running stage. 33922322296000: system.switch_cpus.fetch: Attempting to fetch from [tid:0] 33922322296000: system.switch_cpus.fetch: [tid:0]: Adding instructions to queue to decode. 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ab7 (0) created [sn:5050]. 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add ecx, ecx, esi 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ab9 (0) created [sn:5051]. 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add edx, edx, esi 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abb (0) created [sn:5052]. 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub eax, eax, esi 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abd (0) created [sn:5053]. 33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub ebx, ebx, esi 33922322296000: system.switch_cpus.fetch: [tid:0]: Done fetching, reached fetch bandwidth for this cycle. 33922322296500: system.switch_cpus.BPredUnit: BranchPred: [tid:0]: Committing branches until [sn:5025]. 33922322296500: system.switch_cpus.fetch: Running stage. 33922322296500: system.switch_cpus.fetch: Attempting to fetch from [tid:0] 33922322296500: system.switch_cpus.fetch: [tid:0]: Adding instructions to queue to decode. 33922322296500: system.switch_cpus.fetch: [tid:0]: Issuing a pipelined I-cache access, starting at PC (0x400abf=>0x400ac7).(0=>1). 33922322296500: system.switch_cpus.fetch: [tid:0] Fetching cache line 0x400ac0 for addr 0x400ac0 33922322296500: system.switch_cpus.fetch: Fetch: Doing instruction read. 33922322296500: system.switch_cpus.fetch: [tid:0]: Doing Icache access. 33922322297000: system.switch_cpus.fetch: [tid:0] Waking up from cache miss. 33922322297000: system.switch_cpus.BPredUnit: BranchPred: [tid:0]: Committing branches until [sn:5029]. 33922322297000: system.switch_cpus.fetch: Running stage. 33922322297000: system.switch_cpus.fetch: Attempting to fetch from [tid:0] 33922322297000: system.switch_cpus.fetch: [tid:0]: Icache miss is complete. 33922322297000: system.switch_cpus.fetch: [tid:0]: Adding instructions to queue to decode. 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abf (0) created [sn:5054]. 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub ecx, ecx, esi 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac1 (0) created [sn:5055]. 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub edx, edx, esi 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac3 (0) created [sn:5056]. 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add eax, eax, esi 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac5 (0) created [sn:5057]. 33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add ebx, ebx, esi 33922322297000: system.switch_cpus.fetch: [tid:0]: Done fetching, reached fetch bandwidth for this cycle. 33922322297500: system.switch_cpus.BPredUnit: BranchPred: [tid:0]: Committing branches until [sn:5033]. 33922322297500: system.switch_cpus.fetch: Running stage. 33922322297500: system.switch_cpus.fetch: Attempting to fetch from [tid:0] 33922322297500: system.switch_cpus.fetch: [tid:0]: Adding instructions to queue to decode. 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac7 (0) created [sn:5058]. 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add ecx, ecx, esi 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac9 (0) created [sn:5059]. 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is: ADD_R_R : add edx, edx, esi 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400acb (0) created [sn:5060]. 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub eax, eax, esi 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400acd (0) created [sn:5061]. 33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is: SUB_R_R : sub ebx, ebx, esi 33922322297500: system.switch_cpus.fetch: [tid:0]: Done fetching, reached fetch bandwidth for this cycle. Sorry for the confusion. Runjie On Thu, Oct 25, 2012 at 10:42 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote: > On Wed, 24 Oct 2012, Runjie Zhang wrote: > > Hi, Nilay >> >> I agree with you that to fetch from icache every cycle, hit latency >> don't have to be zero. >> >> Here is a snap shot from the exec trace: (deleted some detail to >> make it more clear) Icache hit latency is 1 cycle and fetch width is 4 >> >> (Ticks) >> ...60..: ....fetch: Running stage. >> ...60..: ....fetch: Attempting to fetch from [tid:0] >> ...60..: ....fetch: [tid:0]: Adding instructions to queue to decode. >> ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab7 (0) created >> [sn:5050]. >> ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab9 (0) created >> [sn:5051]. >> ...60..: ....fetch: [tid:0]: Instruction PC 0x400abb (0) created >> [sn:5052]. >> ...60..: ....fetch: [tid:0]: Instruction PC 0x400abd (0) created >> [sn:5053]. >> ...60..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth >> for this cycle. >> > > What happened on cycles 61-64? Should not the fetch unit try to create > four instructions every cycles? > > > >> ...65..: ....fetch: Running stage. >> ...65..: ....fetch: Attempting to fetch from [tid:0] >> ...65..: ....fetch: [tid:0]: Adding instructions to queue to decode. >> ...65..: ....fetch: [tid:0]: Issuing a pipelined I-cache access, >> starting at PC (0x400abf=>0x400ac7).(0=>1). >> ...65..: ....fetch: [tid:0] Fetching cache line 0x400ac0 for addr 0x400ac0 >> ...65..: ....fetch: Fetch: Doing instruction read. >> ...65..: ....fetch: [tid:0]: Doing Icache access. >> > > What happened in the in-between cycles? > > > >> ...70..: ....fetch: [tid:0] Waking up from cache miss. >> ...70..: ....fetch: Running stage. >> ...70..: ....fetch: Attempting to fetch from [tid:0] >> ...70..: ....fetch: [tid:0]: Icache miss is complete. >> ...70..: ....fetch: [tid:0]: Adding instructions to queue to decode. >> ...70..: ....fetch: [tid:0]: Instruction PC 0x400abf (0) created >> [sn:5054]. >> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac1 (0) created >> [sn:5055]. >> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac3 (0) created >> [sn:5056]. >> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac5 (0) created >> [sn:5057]. >> ...70..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth >> for this cycle. >> >> When entering cycle 65, the previous cache line has been consumed >> so the fetch unit launched a pipelined icache access. However, this >> access has latency of 1 so the fetch unit need to wait till cycle 70 >> to start to fetch again. This created a one cycle stall. If I >> understand correctly, this latency could be covered it the pipelined >> icache access were launched one cycle earlier (in cycle 60). Can I >> configure that in Gem5? >> > > One cycle earlier would mean cycle 64 and not cycle 60. You have > completely removed the trace for the in between cycles which is required > for understanding what was going in the fetch unit during those cycles. > > > >> I am not sure whether the Fetch flag is enough to study this >> phenomenon. If not, please tell me what other flags should I use! >> >> BTW, the O3CPUALL debug flag seems not working. I got error "invalid >> debug flag 'O3CPUALL' ". >> >> > It is not working because you are using the wrong flag. The correct flag > is O3CPUAll. > > -- > Nilay >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users