Ideally, you would want to have queues between Fetch and Decode and Decode and Rename, but instead we have skidBuffers that are dimensioned to absorb instructions inflight between stages in case of a blocking event. So as you point out, while there is still space in the the decode skifBuffer, Fetch considers Decode as blocked and will not send instructions until the skidBuffer is drained.

In particular, when Rename unblocks, there will be pipeline bubbles equal to decodeToRenameDelay (time between when Rename tells Decode that it can begin sending instructions again and when the first instruction arrives Rename) and decodeToRenameDelay + fetchToDecodeDaly (time between when Decode tells Fetch that is can begin sending instructions again and when the first instruction arrives at Rename). With perfect queuing, these bubbles would probably not be present. In addition, the impact of the bubbles only grows with the delays between frontend stages (and is probably okay if both delays are set to 1).

So if you want better queuing, you have to modify how the skidBuffer behaves. I can't comment on how realistic this would be, but I'll point out that Haswell has a decode queue between Decode and Rename/Allocation and 2 instruction queues (because SMT) between the fetch buffer and the decode queue.

Le 05/11/2015 13:31, Virendra Kumar Pathak a écrit :
Hi Arthur Perais,

Thanks for your response. I have a question regarding blocking & unblocking of decode stage.
Could you please help me in understanding them.

I am working on a O3 CPU model with below configuration
fetchToDecodeDelay = 1
fetchWidth = 8
decodeToFetchDelay=1
decodeWidth = 4

skid-buffer b/w fetch & decode -
skidBufferMax = (fetchToDecodeDelay + 1) * params->fetchWidth;
=> skidBufferMax= (1+1) * 8 = 16

All my experiment are based on 1 CPU with 1 thread.

decode_impl.hh:
In decodeInsts() - assuming decode was not in unblocking stage, it will get 8 instruction from insts[tid] (from fetch stage) However, since decodeWidth=4, only 4 instruction will be decode. Thus 4 instruction are still left. In the same function, there is a logic to block the decode stage and inform the fetch stage about it, if all the instruction were not decoded. My question, is why the docode is blocked when there is still space left in skidbuffer (i.e. 12). Fetch can still send instruction in skidbuffer, while decode is busy with 4 instructions.

if (!insts_to_decode.empty()) { block(tid); }

The unblock() in decode stage unblock it only if skidbuffer is empty. Can't we unblock the decode stage as soon as we have enough space for storing fetch instruction (here 8).
DefaultDecode<Impl>::unblock(ThreadID tid)
{
if (skidBuffer[tid].empty()) { toFetch->decodeUnblock[tid] = true; }
}

Please shed some light on this. Am I missing something or interpreting it wrong ?

Thanks.

Thanks for your time in advance.

--
with regards,
Virendra Kumar Pathak


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to