Re: [gem5-users] CPU Configuration

Arthur Perais Fri, 06 Nov 2015 02:32:41 -0800

Ideally, you would want to have queues between Fetch and Decode andDecode and Rename, but instead we have skidBuffers that are dimensionedto absorb instructions inflight between stages in case of a blocking event.So as you point out, while there is still space in the the decodeskifBuffer, Fetch considers Decode as blocked and will not sendinstructions until the skidBuffer is drained.

In particular, when Rename unblocks, there will be pipeline bubblesequal to decodeToRenameDelay (time between when Rename tells Decode thatit can begin sending instructions again and when the first instructionarrives Rename) and decodeToRenameDelay + fetchToDecodeDaly (timebetween when Decode tells Fetch that is can begin sending instructionsagain and when the first instruction arrives at Rename). With perfectqueuing, these bubbles would probably not be present. In addition, theimpact of the bubbles only grows with the delays between frontend stages(and is probably okay if both delays are set to 1).

So if you want better queuing, you have to modify how the skidBufferbehaves. I can't comment on how realistic this would be, but I'll pointout that Haswell has a decode queue between Decode and Rename/Allocationand 2 instruction queues (because SMT) between the fetch buffer and thedecode queue.


Le 05/11/2015 13:31, Virendra Kumar Pathak a écrit :

Hi Arthur Perais,
Thanks for your response. I have a question regarding blocking &unblocking of decode stage.
Could you please help me in understanding them.

I am working on a O3 CPU model with below configuration
fetchToDecodeDelay = 1
fetchWidth = 8
decodeToFetchDelay=1
decodeWidth = 4

skid-buffer b/w fetch & decode -
skidBufferMax = (fetchToDecodeDelay + 1) * params->fetchWidth;
=> skidBufferMax= (1+1) * 8 = 16

All my experiment are based on 1 CPU with 1 thread.

decode_impl.hh:
In decodeInsts() - assuming decode was not in unblocking stage, itwill get 8 instruction from insts[tid] (from fetch stage)However, since decodeWidth=4, only 4 instruction will be decode. Thus4 instruction are still left.In the same function, there is a logic to block the decode stage andinform the fetch stage about it, if all the instruction were notdecoded. My question, is why the docode is blocked when there is stillspace left in skidbuffer (i.e. 12). Fetch can still send instructionin skidbuffer, while decode is busy with 4 instructions.
if (!insts_to_decode.empty()) { block(tid); }
The unblock() in decode stage unblock it only if skidbuffer is empty.Can't we unblock the decode stage as soon as we have enough space forstoring fetch instruction (here 8).
DefaultDecode<Impl>::unblock(ThreadID tid)
{
if (skidBuffer[tid].empty()) { toFetch->decodeUnblock[tid] = true; }
}
Please shed some light on this. Am I missing something or interpretingit wrong ?
Thanks.

Thanks for your time in advance.

--
with regards,
Virendra Kumar Pathak


_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Arthur Perais
INRIA Bretagne Atlantique
Bâtiment 12E, Bureau E303, Campus de Beaulieu
35042 Rennes, France

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] CPU Configuration

Reply via email to