> > It should do the job, at least for EM where the jump takes 2 cycle, and by
> means of using delay slots we can make all the cycles count. HS has a branch
> prediction mechanism, hence, filling up the delay slot doesn't have such a big
> impact like in EM or even earlier cpus.
> No, the alternative is to hide the delay slot, so if the branch is
> predicted properly, the case with
> different high words should be faster without the .d suffix.
> 
> I.e. , eagerly filling the delay slot like this has a bigger - negative
> - impact on performance.


If we talking about HS, then we can add another flag 'T' which should instruct 
the branch prediction that we expect this branch to be taken. However, I 
haven't seen any impact of this flag on the code, and the compiler generates 
this. In general, the HS branch prediction has some particularities. Although 
what you say makes perfect sense, I am almost sure it doesn't apply in the case 
of HS because of the way how it is implemented. But this is a good point, I 
will try to keep it in mind and ask the hw guys what is best.

//Claudiu

Reply via email to