> On Dec 13, 2014, at 5:22 AM, Ajit Kumar Agarwal > <ajit.kumar.agar...@xilinx.com> wrote: > > Hello All: > > Since the prefetch instruction have no direct consumers in the code stream, > they provide considerable freedom to the > Instruction scheduler. They are typically assigned lower priorities than most > of the instructions in the code stream. > This tends to cause all the prefetch instructions to be placed together in > the final schedule. This causes the performance > Degradations by placing them in clumps rather than evenly spreading the > prefetch instructions. > > The evenly spreading the prefetch instruction gives better speed up ratios as > compared to be placing in clumps for dirty > Misses.
I can believe that’s true for some processors; is it true for all of them? I have the impression that some MIPS processors don’t mind clumped prefetches, so long as you don’t exceed the limit on total number of concurrently pending memory accesses. paul