On Fri, 27 Apr 2012, Paolo Bonzini wrote: > > What about cost considerations? We only seem to have the general > > "branches are expensive" metric - but ctz/clz may be prohibitely expensive > > themselves, no? > > Yeah, that's a general problem with this kind of tricks. In general > however clz/ctz is getting less and less expensive, so I don't think > it is worrisome (at least at the beginning of stage 1). We can add > rtx_costs checks later. > > Among architectures I know, only i386 has an expensive bsf/bsr but > it also has sete/setne which GCC will use instead of this trick. > > Looking at rtx_costs, nothing seems to mark clz/ctz as prohibitively > expensive (Xtensa does, but only in the case when the optab handler > will not exist). I realize though that this is not a particularly > good statistic, since the compiler would not generate them out of > its hat until now.
For the record: MIPS processors that implement CLZ/CLO (for some reason CTZ/CTO haven't been added to the architecture, but these operations can be cheaply transformed into CLZ/CLO) generally have a dedicated unit that causes no pipeline stall for these instructions even in the simplest pipeline designs like the M4K -- IOW they are issued at the usual one instruction per pipeline clock rate. Of course all MIPS processors have SLT too, so perhaps they won't benefit from your change either. Maciej