Sorry for the confusion.

The numbers 60, 65 and 70 were part of the tick number each cycle started. I
removed some digits in the tick count to make each line shorter...

The complete trace looks like this:

33922322296000: system.switch_cpus.fetch: Running stage.
33922322296000: system.switch_cpus.fetch: Attempting to fetch from [tid:0]
33922322296000: system.switch_cpus.fetch: [tid:0]: Adding instructions to
queue to decode.
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ab7
(0) created [sn:5050].
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   ecx, ecx, esi
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ab9
(0) created [sn:5051].
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   edx, edx, esi
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abb
(0) created [sn:5052].
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   eax, eax, esi
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abd
(0) created [sn:5053].
33922322296000: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   ebx, ebx, esi
33922322296000: system.switch_cpus.fetch: [tid:0]: Done fetching, reached
fetch bandwidth for this cycle.

33922322296500: system.switch_cpus.BPredUnit: BranchPred: [tid:0]:
Committing branches until [sn:5025].
33922322296500: system.switch_cpus.fetch: Running stage.
33922322296500: system.switch_cpus.fetch: Attempting to fetch from [tid:0]
33922322296500: system.switch_cpus.fetch: [tid:0]: Adding instructions to
queue to decode.
33922322296500: system.switch_cpus.fetch: [tid:0]: Issuing a pipelined
I-cache access, starting at PC (0x400abf=>0x400ac7).(0=>1).
33922322296500: system.switch_cpus.fetch: [tid:0] Fetching cache line
0x400ac0 for addr 0x400ac0
33922322296500: system.switch_cpus.fetch: Fetch: Doing instruction read.
33922322296500: system.switch_cpus.fetch: [tid:0]: Doing Icache access.
33922322297000: system.switch_cpus.fetch: [tid:0] Waking up from cache miss.
33922322297000: system.switch_cpus.BPredUnit: BranchPred: [tid:0]:
Committing branches until [sn:5029].
33922322297000: system.switch_cpus.fetch: Running stage.
33922322297000: system.switch_cpus.fetch: Attempting to fetch from [tid:0]
33922322297000: system.switch_cpus.fetch: [tid:0]: Icache miss is complete.
33922322297000: system.switch_cpus.fetch: [tid:0]: Adding instructions to
queue to decode.
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400abf
(0) created [sn:5054].
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   ecx, ecx, esi
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac1
(0) created [sn:5055].
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   edx, edx, esi
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac3
(0) created [sn:5056].
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   eax, eax, esi
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac5
(0) created [sn:5057].
33922322297000: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   ebx, ebx, esi
33922322297000: system.switch_cpus.fetch: [tid:0]: Done fetching, reached
fetch bandwidth for this cycle.

33922322297500: system.switch_cpus.BPredUnit: BranchPred: [tid:0]:
Committing branches until [sn:5033].
33922322297500: system.switch_cpus.fetch: Running stage.
33922322297500: system.switch_cpus.fetch: Attempting to fetch from [tid:0]
33922322297500: system.switch_cpus.fetch: [tid:0]: Adding instructions to
queue to decode.
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac7
(0) created [sn:5058].
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   ecx, ecx, esi
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400ac9
(0) created [sn:5059].
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is:
ADD_R_R : add   edx, edx, esi
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400acb
(0) created [sn:5060].
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   eax, eax, esi
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction PC 0x400acd
(0) created [sn:5061].
33922322297500: system.switch_cpus.fetch: [tid:0]: Instruction is:
SUB_R_R : sub   ebx, ebx, esi
33922322297500: system.switch_cpus.fetch: [tid:0]: Done fetching, reached
fetch bandwidth for this cycle.

Sorry for the confusion.

Runjie


On Thu, Oct 25, 2012 at 10:42 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:

> On Wed, 24 Oct 2012, Runjie Zhang wrote:
>
>  Hi, Nilay
>>
>>  I agree with you that to fetch from icache every cycle, hit latency
>> don't have to be zero.
>>
>>  Here is a snap shot from the exec trace: (deleted some detail to
>> make it more clear) Icache hit latency is 1 cycle and fetch width is 4
>>
>> (Ticks)
>> ...60..: ....fetch: Running stage.
>> ...60..: ....fetch: Attempting to fetch from [tid:0]
>> ...60..: ....fetch: [tid:0]: Adding instructions to queue to decode.
>> ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab7 (0) created
>> [sn:5050].
>> ...60..: ....fetch: [tid:0]: Instruction PC 0x400ab9 (0) created
>> [sn:5051].
>> ...60..: ....fetch: [tid:0]: Instruction PC 0x400abb (0) created
>> [sn:5052].
>> ...60..: ....fetch: [tid:0]: Instruction PC 0x400abd (0) created
>> [sn:5053].
>> ...60..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth
>> for this cycle.
>>
>
> What happened on cycles 61-64? Should not the fetch unit try to create
> four instructions every cycles?
>
>
>
>> ...65..: ....fetch: Running stage.
>> ...65..: ....fetch: Attempting to fetch from [tid:0]
>> ...65..: ....fetch: [tid:0]: Adding instructions to queue to decode.
>> ...65..: ....fetch: [tid:0]: Issuing a pipelined I-cache access,
>> starting at PC (0x400abf=>0x400ac7).(0=>1).
>> ...65..: ....fetch: [tid:0] Fetching cache line 0x400ac0 for addr 0x400ac0
>> ...65..: ....fetch: Fetch: Doing instruction read.
>> ...65..: ....fetch: [tid:0]: Doing Icache access.
>>
>
> What happened in the in-between cycles?
>
>
>
>> ...70..: ....fetch: [tid:0] Waking up from cache miss.
>> ...70..: ....fetch: Running stage.
>> ...70..: ....fetch: Attempting to fetch from [tid:0]
>> ...70..: ....fetch: [tid:0]: Icache miss is complete.
>> ...70..: ....fetch: [tid:0]: Adding instructions to queue to decode.
>> ...70..: ....fetch: [tid:0]: Instruction PC 0x400abf (0) created
>> [sn:5054].
>> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac1 (0) created
>> [sn:5055].
>> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac3 (0) created
>> [sn:5056].
>> ...70..: ....fetch: [tid:0]: Instruction PC 0x400ac5 (0) created
>> [sn:5057].
>> ...70..: ....fetch: [tid:0]: Done fetching, reached fetch bandwidth
>> for this cycle.
>>
>>   When entering cycle 65, the previous cache line has been consumed
>> so the fetch unit launched a pipelined icache access. However, this
>> access has latency of 1 so the fetch unit need to wait till cycle 70
>> to start to fetch again. This created a one cycle stall. If I
>> understand correctly, this latency could be covered it the pipelined
>> icache access were launched one cycle earlier (in cycle 60). Can I
>> configure that in Gem5?
>>
>
> One cycle earlier would mean cycle 64 and not cycle 60. You have
> completely removed the trace for the in between cycles which is required
> for understanding what was going in the fetch unit during those cycles.
>
>
>
>>  I am not sure whether the Fetch flag is enough to study this
>> phenomenon. If not, please tell me what other flags should I use!
>>
>> BTW, the O3CPUALL debug flag seems not working. I got error "invalid
>> debug flag 'O3CPUALL' ".
>>
>>
> It is not working because you are using the wrong flag. The correct flag
> is O3CPUAll.
>
> --
> Nilay
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to