On 24.05.2024 22:03, Andrew Cooper wrote: > The asm in arch_ffs() is safe but inefficient. > > CMOV would be an improvement over a conditional branch, but for 64bit CPUs > both Intel and AMD have provided enough details about the behaviour for a zero > input. It is safe to pre-load the destination register with -1 and drop the > conditional logic. > > However, it is common to find ffs() in a context where the optimiser knows > that x in nonzero even if it the value isn't known precisely, and in that case > it's safe to drop the preload of -1 too. > > There are only a handful of uses of ffs() in the x86 build, and all of them > improve as a result of this: > > add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31) > Function old new delta > mask_write 114 107 -7 > xmem_pool_alloc 1063 1039 -24 > > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com> with one suggestion: > --- a/xen/arch/x86/include/asm/bitops.h > +++ b/xen/arch/x86/include/asm/bitops.h > @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) > > static always_inline unsigned int arch_ffs(unsigned int x) > { > - int r; > + unsigned int r; > + > + if ( __builtin_constant_p(x > 0) && x > 0 ) __builtin_constant_p(x) surely will do. In fact even the other "> 0" could in principle be left out here. Jan