Does any processor besides the PowerPC support varients of the prefetch instruction that you can tell it that there are hardware streams with a given stride at the beginning of the loop, rather than doing a prefetch inside the loop for a future cache entry?
I'm just starting to look at adding support for the PowerPC's advanced forms of the dbct instruction that set up these hardware streams, and was wondering whether there were other architectures that could use this. I couldn't find anything in the AMD and Intel architecture manuals. The IBM XL compiler supports this, but I would like to add similar support in GCC. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899