> > It should do the job, at least for EM where the jump takes 2 cycle, and by > means of using delay slots we can make all the cycles count. HS has a branch > prediction mechanism, hence, filling up the delay slot doesn't have such a big > impact like in EM or even earlier cpus. > No, the alternative is to hide the delay slot, so if the branch is > predicted properly, the case with > different high words should be faster without the .d suffix. > > I.e. , eagerly filling the delay slot like this has a bigger - negative > - impact on performance.
If we talking about HS, then we can add another flag 'T' which should instruct the branch prediction that we expect this branch to be taken. However, I haven't seen any impact of this flag on the code, and the compiler generates this. In general, the HS branch prediction has some particularities. Although what you say makes perfect sense, I am almost sure it doesn't apply in the case of HS because of the way how it is implemented. But this is a good point, I will try to keep it in mind and ask the hw guys what is best. //Claudiu