Ideally, you would want to have queues between Fetch and Decode and
Decode and Rename, but instead we have skidBuffers that are dimensioned
to absorb instructions inflight between stages in case of a blocking event.
So as you point out, while there is still space in the the decode
skifBuffer, Fetch considers Decode as blocked and will not send
instructions until the skidBuffer is drained.
In particular, when Rename unblocks, there will be pipeline bubbles
equal to decodeToRenameDelay (time between when Rename tells Decode that
it can begin sending instructions again and when the first instruction
arrives Rename) and decodeToRenameDelay + fetchToDecodeDaly (time
between when Decode tells Fetch that is can begin sending instructions
again and when the first instruction arrives at Rename). With perfect
queuing, these bubbles would probably not be present. In addition, the
impact of the bubbles only grows with the delays between frontend stages
(and is probably okay if both delays are set to 1).
So if you want better queuing, you have to modify how the skidBuffer
behaves. I can't comment on how realistic this would be, but I'll point
out that Haswell has a decode queue between Decode and Rename/Allocation
and 2 instruction queues (because SMT) between the fetch buffer and the
decode queue.
Le 05/11/2015 13:31, Virendra Kumar Pathak a écrit :
Hi Arthur Perais,
Thanks for your response. I have a question regarding blocking &
unblocking of decode stage.
Could you please help me in understanding them.
I am working on a O3 CPU model with below configuration
fetchToDecodeDelay = 1
fetchWidth = 8
decodeToFetchDelay=1
decodeWidth = 4
skid-buffer b/w fetch & decode -
skidBufferMax = (fetchToDecodeDelay + 1) * params->fetchWidth;
=> skidBufferMax= (1+1) * 8 = 16
All my experiment are based on 1 CPU with 1 thread.
decode_impl.hh:
In decodeInsts() - assuming decode was not in unblocking stage, it
will get 8 instruction from insts[tid] (from fetch stage)
However, since decodeWidth=4, only 4 instruction will be decode. Thus
4 instruction are still left.
In the same function, there is a logic to block the decode stage and
inform the fetch stage about it, if all the instruction were not
decoded. My question, is why the docode is blocked when there is still
space left in skidbuffer (i.e. 12). Fetch can still send instruction
in skidbuffer, while decode is busy with 4 instructions.
if (!insts_to_decode.empty()) { block(tid); }
The unblock() in decode stage unblock it only if skidbuffer is empty.
Can't we unblock the decode stage as soon as we have enough space for
storing fetch instruction (here 8).
DefaultDecode<Impl>::unblock(ThreadID tid)
{
if (skidBuffer[tid].empty()) { toFetch->decodeUnblock[tid] = true; }
}
Please shed some light on this. Am I missing something or interpreting
it wrong ?
Thanks.
Thanks for your time in advance.
--
with regards,
Virendra Kumar Pathak
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users