On Sat, May 5, 2012 at 11:52 PM, Maciej W. Rozycki <ma...@linux-mips.org> wrote: > For the record: MIPS processors that implement CLZ/CLO (for some reason > CTZ/CTO haven't been added to the architecture, but these operations can > be cheaply transformed into CLZ/CLO) generally have a dedicated unit that > causes no pipeline stall for these instructions even in the simplest > pipeline designs like the M4K -- IOW they are issued at the usual one > instruction per pipeline clock rate.
Even on Octeon this is true. Though Octeon has seq/sneq too so it does not matter in the end. Note I originally was the one who proposed this optimization for PowerPC even before I saw what XLC did. See PR 10588 (which I filed 9 years ago) and it seems we are about to fix it soon. Thanks, Andrew Pinski