Didn't realized we went offline.
Thanks Korey

-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Korey Sewell
Sent: Monday, July 16, 2012 5:36 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] Branch prediction issue in InOrder CPU

Just to keep this answer archived, here is a summary of my response from
Yuval and I's offline convo:

-Yuval-
...cpu.cc code ..
// FETCH
    F.needs(FetchSeq, FetchSeqUnit::AssignNextPC);
    F.needs(ICache, FetchUnit::InitiateFetch);
    F.needs(ICache, FetchUnit::CompleteFetch);
    F.needs(BPred, BranchPredictor::PredictBranch);
    F.needs(FetchSeq, FetchSeqUnit::UpdateTargetPC);


    // DECODE
//    D.needs(ICache, FetchUnit::CompleteFetch);
    D.needs(Decode, DecodeUnit::DecodeInst);
//    D.needs(BPred, BranchPredictor::PredictBranch);
//    D.needs(FetchSeq, FetchSeqUnit::UpdateTargetPC);

-Korey-
The problem is that you are predicting a branch and updating the target
before you even decode it. In the code, there are a number of checks that
can only be done correctly once an instruction has it's decode flags set.
For instance, updateTargetPC would not work unless
inst->isBranch() is true (making up the function name, but think the
function that checks what type of branch it is).

So I could understand why the updates and code would not work in this case.
I'd suggest if you want to do all these things, just move the DecodeInst
command up to Fetch right after the CompleteFetch command.
basically, keep those decode, predict, and update commands in the same
order. Remember, you are just modeling a decode in the 2nd stage, like you
are modeling a long pipeline with unused stages. Functionally, you should be
able to actually decode in Fetch given that you are not requiring any action
that depends on the decode explicitly being in stage 2.

Does that make sense?

On Wed, Jul 11, 2012 at 6:42 AM, Yuval H. Nacson
<[email protected]> wrote:
> Hey Korey,
>
> I saw the thread from back than and changed the architecture according 
> to it using also as reference the 9 stage pipeline_traits.cc.
>
> Yuval.
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Korey Sewell
> Sent: Monday, July 09, 2012 7:57 PM
> To: gem5 users mailing list
> Subject: Re: [gem5-users] Branch prediction issue in InOrder CPU
>
> Hi,
> When you updated your pipeline to 12 stages, did you also update the 
> per-pipeline instruction schedule? Should be
> createFrontSked()/createBackSked() functions in cpu.cc. There was a 
> thread on this a while back that I replied to.
>
> In the event, that doesnt work, give me a day or two to look at the 
> trace a little more closely and speculate on  fix.
>
> On Sun, Jul 8, 2012 at 10:10 AM, Ali Saidi <[email protected]> wrote:
>> It might be… Korey, any thoughts?
>>
>> I'm not that familiar with the in order cpu, but it either needs to 
>> store the predicted address or one previous to it in all cases. My 
>> only worry with this change is that with the RAS is used you'll end 
>> up unnecessarily incrementing the PC and mispredicting there.
>>
>> Ali
>>
>> On Jul 5, 2012, at 5:48 AM, Yuval H. Nacson wrote:
>>
>> Hello,
>>
>> I tried to run a very simple program in the InOrder environment.
>> I extended the pipe to 12 stages, and also moved the prediction and 
>> fetch update to the first stage.
>>
>> j1: subl $2, 1, $2;
>> addl $1, 4, $1;
>> bgt $2, j1;
>>
>> This loop runs for a million iterations.
>> The cpi for this part is expected to be 1 or very close to, but it is 4.
>>
>> Looking in the debug flags I see the following:
>> Cycle n:
>> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002] requesting 
>> this resource.
>> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] bge
>> r18,0x120008220 ... PC (0x120008238=>0x12000823c) doing branch 
>> prediction
>> 9522195: system.cpu.branch_predictor: [tid:0]: [asid:0] Instruction
>> (0x120008238=>0x12000823c) predicted target is
(0x120008238=>0x120008220).
>> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Setting 
>> Predicted PC to (0x120008238=>0x120008220).
>> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] pushed onto 
>> front of predHist ...predHist.size(): 1
>> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Branch 
>> predicted true.
>> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: bge 
>> Predicted PC is (0x120008238=>0x120008220).
>> 9522195: system.cpu.branch_predictor[slot:0]:: done with request from 
>> [sn:1002] [tid:0].
>> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to 
>> system.cpu.branch_predictor completed.
>> 9522195: system.cpu.branch_predictor: Deallocating [slot:0].
>> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: sending request to 
>> system.cpu.fetch_seq_unit.
>> 9522195: system.cpu.fetch_seq_unit: Allocating [slot:1] for [tid:0]:
>> [sn:1002]
>> 9522195: system.cpu.fetch_seq_unit: [tid:0]: [sn:1002] requesting 
>> this resource.
>> 9522195: system.cpu.fetch_seq_unit: [tid:0] Setting up squash to 
>> start from stage 0, after [sn:0].
>> 9522195: system.cpu.fetch_seq_unit[slot:1]:: done with request from 
>> [sn:1002] [tid:0].
>> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to 
>> system.cpu.fetch_seq_unit completed.
>> 9522195: system.cpu.fetch_seq_unit: Deallocating [slot:1].
>> 9522195: system.cpu.stage0: [tid:0]: Attempting to send instructions 
>> to stage 1.
>> 9522195: system.cpu.stage0: [tid:0] 1 slots available in next stage
> buffer.
>> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: being placed into 
>> index 0 of stage buffer 0 queue.
>> 9522195: system.cpu.stage0: Done Processing [tid:0]
>> 9522195: system.cpu.stage0:
>>
>> Cycle n+1:
>> 9522612: system.cpu.fetch_seq_unit: Allocating [slot:0] for [tid:0]:
>> [sn:1003]
>> 9522612: system.cpu.fetch_seq_unit: [tid:0]: instruction requesting 
>> this resource.
>> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Current PC is
>> (0x120008238=>0x120008220)
>> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Assigning [sn:1003] to 
>> PC
>> (0x120008238=>0x120008220)
>> 9522612: system.cpu.fetch_seq_unit[slot:0]:: done with request from 
>> [sn:1003] [tid:0].
>>
>> I cleaned a lot of unnecessary messeges for clarity. But the Idea is 
>> that the branch command does not update the fetch correctly. The 
>> predictor has problems predicting the second command claiming that 
>> the BTB missing an entry for this command and so the branch is 
>> predicted as not taken
>> (reminder: this is the second consecutive appearance of this branch).
>> Later on stage 10 (out of 12 in my pipe) the second branch is being 
>> flushed because of miss prediction. And it's happens all the time 
>> causing my CPI to be 4 times it's calculated value.
>>
>> I located the problem in the bpred_unit.cc file. And after I receive 
>> a prediction from the predictor I advance it making the fetch unit 
>> not fetching the second branch. Now all is well.
>>
>> I attach my change (It's towards the end):
>>
>> bool
>> BPredUnit::predict(DynInstPtr &inst, TheISA::PCState &predPC, 
>> ThreadID
>> tid) {
>>     // See if branch predictor predicts taken.
>>     // If so, get its target addr either from the BTB or the RAS.
>>     // Save off record of branch stuff so the RAS can be fixed
>>     // up once it's done.
>>
>>     using TheISA::MachInst;
>>
>>     int asid = inst->asid;
>>     bool pred_taken = false;
>>     TheISA::PCState target;
>>
>>     ++lookups;
>>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] %s ... PC %s doing branch "
>>             "prediction\n", tid, inst->seqNum,
>>             inst->staticInst->disassemble(inst->instAddr()),
>>             inst->pcState());
>>
>>
>>     void *bp_history = NULL;
>>
>>     if (inst->isUncondCtrl()) {
>>         DPRINTF(InOrderBPred, "[tid:%i] Unconditional control.\n",
>>                 tid);
>>         pred_taken = true;
>>         // Tell the BP there was an unconditional branch.
>>         BPUncond(bp_history);
>>
>>         if (inst->isReturn() && RAS[tid].empty()) {
>>             DPRINTF(InOrderBPred, "[tid:%i] RAS is empty, predicting "
>>                     "false.\n", tid);
>>             pred_taken = false;
>>         }
>>     } else {
>>         ++condPredicted;
>>
>>         pred_taken = BPLookup(predPC.instAddr(), bp_history);
>>     }
>>
>>     PredictorHistory predict_record(inst->seqNum, predPC, pred_taken,
>>                                     bp_history, tid);
>>
>>     // Now lookup in the BTB or RAS.
>>     if (pred_taken) {
>>         if (inst->isReturn()) {
>>             ++usedRAS;
>>
>>             // If it's a function return call, then look up the address
>>             // in the RAS.
>>             TheISA::PCState rasTop = RAS[tid].top();
>>             target = TheISA::buildRetPC(inst->pcState(), rasTop);
>>
>>             // Record the top entry of the RAS, and its index.
>>             predict_record.usedRAS = true;
>>             predict_record.RASIndex = RAS[tid].topIdx();
>>             predict_record.rasTarget = rasTop;
>>
>>             assert(predict_record.RASIndex < 16);
>>
>>             RAS[tid].pop();
>>
>>             DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s is a return,
"
>>                     "RAS predicted target: %s, RAS index: %i.\n",
>>                     tid, inst->pcState(), target,
>>                     predict_record.RASIndex);
>>         } else {
>>             ++BTBLookups;
>>
>>             if (inst->isCall()) {
>>
>>                 RAS[tid].push(inst->pcState());
>>
>>                 // Record that it was a call so that the top RAS entry
can
>>                 // be popped off if the speculation is incorrect.
>>                 predict_record.wasCall = true;
>>
>>                 DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s was a
> call"
>>                         ", adding %s to the RAS index: %i.\n",
>>                         tid, inst->pcState(), predPC,
>>                         RAS[tid].topIdx());
>>             }
>>
>>             if (inst->isCall() &&
>>                 inst->isUncondCtrl() &&
>>                 inst->isDirectCtrl()) {
>>                 target = inst->branchTarget();
>>             } else if (BTB.valid(predPC.instAddr(), asid)) {
>>                 ++BTBHits;
>>
>>                 // If it's not a return, use the BTB to get the 
>> target
> addr.
>>                 target = BTB.lookup(predPC.instAddr(), asid);
>>
>>                 DPRINTF(InOrderBPred, "[tid:%i]: [asid:%i] 
>> Instruction %s
> "
>>                         "predicted target is %s.\n",
>>                         tid, asid, inst->pcState(), target);
>>             } else {
>>                 DPRINTF(InOrderBPred, "[tid:%i]: BTB doesn't have a "
>>                         "valid entry, predicting false.\n",tid);
>>                 pred_taken = false;
>>             }
>>         }
>>     }
>>
>>     if (pred_taken) {
>>         // Set the PC and the instruction's predicted target.
>>         predPC = target;
>>                 TheISA::advancePC(predPC, inst->staticInst);  ß
>>     }
>>     DPRINTF(InOrderBPred, "[tid:%i]: [sn:%i]: Setting Predicted PC to 
>> %s.\n",
>>             tid, inst->seqNum, predPC);
>>
>>     predHist[tid].push_front(predict_record);
>>
>>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] pushed onto front of 
>> predHist
> "
>>             "...predHist.size(): %i\n",
>>             tid, inst->seqNum, predHist[tid].size());
>>
>>     return pred_taken;
>> }
>>
>> Does my story seems right and is my correction is in place?
>>
>> Thanks,
>> Yuval.
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> --
> - Korey
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
- Korey
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to