Yuval,
either offline or on the mailing list,
can you post what you have in your createFrontSked() and
createBackSked() functions?

Additionally, What changes did you make to extend the pipeline to 12 stages?

On Thu, Jul 5, 2012 at 2:48 AM, Yuval H. Nacson
<[email protected]> wrote:
> Hello,
>
>
>
> I tried to run a very simple program in the InOrder environment.
>
> I extended the pipe to 12 stages, and also moved the prediction and fetch
> update to the first stage.
>
>
>
> j1: subl $2, 1, $2;
>
> addl $1, 4, $1;
>
> bgt $2, j1;
>
>
>
> This loop runs for a million iterations.
>
> The cpi for this part is expected to be 1 or very close to, but it is 4.
>
>
>
> Looking in the debug flags I see the following:
>
> Cycle n:
>
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002] requesting this
> resource.
>
> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] bge
> r18,0x120008220 ... PC (0x120008238=>0x12000823c) doing branch prediction
>
> 9522195: system.cpu.branch_predictor: [tid:0]: [asid:0] Instruction
> (0x120008238=>0x12000823c) predicted target is (0x120008238=>0x120008220).
>
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Setting Predicted
> PC to (0x120008238=>0x120008220).
>
> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] pushed onto front of
> predHist ...predHist.size(): 1
>
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Branch predicted
> true.
>
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: bge Predicted PC
> is (0x120008238=>0x120008220).
>
> 9522195: system.cpu.branch_predictor[slot:0]:: done with request from
> [sn:1002] [tid:0].
>
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to
> system.cpu.branch_predictor completed.
>
> 9522195: system.cpu.branch_predictor: Deallocating [slot:0].
>
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: sending request to
> system.cpu.fetch_seq_unit.
>
> 9522195: system.cpu.fetch_seq_unit: Allocating [slot:1] for [tid:0]:
> [sn:1002]
>
> 9522195: system.cpu.fetch_seq_unit: [tid:0]: [sn:1002] requesting this
> resource.
>
> 9522195: system.cpu.fetch_seq_unit: [tid:0] Setting up squash to start from
> stage 0, after [sn:0].
>
> 9522195: system.cpu.fetch_seq_unit[slot:1]:: done with request from
> [sn:1002] [tid:0].
>
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to
> system.cpu.fetch_seq_unit completed.
>
> 9522195: system.cpu.fetch_seq_unit: Deallocating [slot:1].
>
> 9522195: system.cpu.stage0: [tid:0]: Attempting to send instructions to
> stage 1.
>
> 9522195: system.cpu.stage0: [tid:0] 1 slots available in next stage buffer.
>
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: being placed into  index 0
> of stage buffer 0 queue.
>
> 9522195: system.cpu.stage0: Done Processing [tid:0]
>
> 9522195: system.cpu.stage0:
>
>
>
> Cycle n+1:
>
> 9522612: system.cpu.fetch_seq_unit: Allocating [slot:0] for [tid:0]:
> [sn:1003]
>
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: instruction requesting this
> resource.
>
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Current PC is
> (0x120008238=>0x120008220)
>
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Assigning [sn:1003] to PC
> (0x120008238=>0x120008220)
>
> 9522612: system.cpu.fetch_seq_unit[slot:0]:: done with request from
> [sn:1003] [tid:0].
>
>
>
> I cleaned a lot of unnecessary messeges for clarity. But the Idea is that
> the branch command does not update the fetch correctly. The predictor has
> problems predicting the second command claiming that the BTB missing an
> entry for this command and so the branch is predicted as not taken
> (reminder: this is the second consecutive appearance of this branch).
>
> Later on stage 10 (out of 12 in my pipe) the second branch is being flushed
> because of miss prediction. And it's happens all the time causing my CPI to
> be 4 times it's calculated value.
>
>
>
> I located the problem in the bpred_unit.cc file. And after I receive a
> prediction from the predictor I advance it making the fetch unit not
> fetching the second branch. Now all is well.
>
>
>
> I attach my change (It's towards the end):
>
>
>
> bool
>
> BPredUnit::predict(DynInstPtr &inst, TheISA::PCState &predPC, ThreadID tid)
>
> {
>
>     // See if branch predictor predicts taken.
>
>     // If so, get its target addr either from the BTB or the RAS.
>
>     // Save off record of branch stuff so the RAS can be fixed
>
>     // up once it's done.
>
>
>
>     using TheISA::MachInst;
>
>
>
>     int asid = inst->asid;
>
>     bool pred_taken = false;
>
>     TheISA::PCState target;
>
>
>
>     ++lookups;
>
>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] %s ... PC %s doing branch "
>
>             "prediction\n", tid, inst->seqNum,
>
>             inst->staticInst->disassemble(inst->instAddr()),
>
>             inst->pcState());
>
>
>
>
>
>     void *bp_history = NULL;
>
>
>
>     if (inst->isUncondCtrl()) {
>
>         DPRINTF(InOrderBPred, "[tid:%i] Unconditional control.\n",
>
>                 tid);
>
>         pred_taken = true;
>
>         // Tell the BP there was an unconditional branch.
>
>         BPUncond(bp_history);
>
>
>
>         if (inst->isReturn() && RAS[tid].empty()) {
>
>             DPRINTF(InOrderBPred, "[tid:%i] RAS is empty, predicting "
>
>                     "false.\n", tid);
>
>             pred_taken = false;
>
>         }
>
>     } else {
>
>         ++condPredicted;
>
>
>
>         pred_taken = BPLookup(predPC.instAddr(), bp_history);
>
>     }
>
>
>
>     PredictorHistory predict_record(inst->seqNum, predPC, pred_taken,
>
>                                     bp_history, tid);
>
>
>
>     // Now lookup in the BTB or RAS.
>
>     if (pred_taken) {
>
>         if (inst->isReturn()) {
>
>             ++usedRAS;
>
>
>
>             // If it's a function return call, then look up the address
>
>             // in the RAS.
>
>             TheISA::PCState rasTop = RAS[tid].top();
>
>             target = TheISA::buildRetPC(inst->pcState(), rasTop);
>
>
>
>             // Record the top entry of the RAS, and its index.
>
>             predict_record.usedRAS = true;
>
>             predict_record.RASIndex = RAS[tid].topIdx();
>
>             predict_record.rasTarget = rasTop;
>
>
>
>             assert(predict_record.RASIndex < 16);
>
>
>
>             RAS[tid].pop();
>
>
>
>             DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s is a return, "
>
>                     "RAS predicted target: %s, RAS index: %i.\n",
>
>                     tid, inst->pcState(), target,
>
>                     predict_record.RASIndex);
>
>         } else {
>
>             ++BTBLookups;
>
>
>
>             if (inst->isCall()) {
>
>
>
>                 RAS[tid].push(inst->pcState());
>
>
>
>                 // Record that it was a call so that the top RAS entry can
>
>                 // be popped off if the speculation is incorrect.
>
>                 predict_record.wasCall = true;
>
>
>
>                 DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s was a call"
>
>                         ", adding %s to the RAS index: %i.\n",
>
>                         tid, inst->pcState(), predPC,
>
>                         RAS[tid].topIdx());
>
>             }
>
>
>
>             if (inst->isCall() &&
>
>                 inst->isUncondCtrl() &&
>
>                 inst->isDirectCtrl()) {
>
>                 target = inst->branchTarget();
>
>             } else if (BTB.valid(predPC.instAddr(), asid)) {
>
>                 ++BTBHits;
>
>
>
>                 // If it's not a return, use the BTB to get the target addr.
>
>                 target = BTB.lookup(predPC.instAddr(), asid);
>
>
>
>                 DPRINTF(InOrderBPred, "[tid:%i]: [asid:%i] Instruction %s "
>
>                         "predicted target is %s.\n",
>
>                         tid, asid, inst->pcState(), target);
>
>             } else {
>
>                 DPRINTF(InOrderBPred, "[tid:%i]: BTB doesn't have a "
>
>                         "valid entry, predicting false.\n",tid);
>
>                 pred_taken = false;
>
>             }
>
>         }
>
>     }
>
>
>
>     if (pred_taken) {
>
>         // Set the PC and the instruction's predicted target.
>
>         predPC = target;
>
>                 TheISA::advancePC(predPC, inst->staticInst);  ß
>
>     }
>
>     DPRINTF(InOrderBPred, "[tid:%i]: [sn:%i]: Setting Predicted PC to
> %s.\n",
>
>             tid, inst->seqNum, predPC);
>
>
>
>     predHist[tid].push_front(predict_record);
>
>
>
>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] pushed onto front of predHist "
>
>             "...predHist.size(): %i\n",
>
>             tid, inst->seqNum, predHist[tid].size());
>
>
>
>     return pred_taken;
>
> }
>
>
>
> Does my story seems right and is my correction is in place?
>
>
>
> Thanks,
>
> Yuval.
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



-- 
- Korey
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to