It turns out that GCCs 4.9.2 and 6.3.0 instantiate __scanbit() in three translation units, but never references the result. All real uses of __scanbit() are already suitably inline.
Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> --- CC: Jan Beulich <jbeul...@suse.com> Forcing __scanbit() to be always_inline appears to cause GCC to reorder some of its basic blocks, so there is a moderately large perturbance to functions. As far as I can see, even the register scheduling is the same, and the delta is just changes in the nops used to align the basic blocks. --- xen/include/asm-x86/bitops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/include/asm-x86/bitops.h b/xen/include/asm-x86/bitops.h index fd494e8..0f18645 100644 --- a/xen/include/asm-x86/bitops.h +++ b/xen/include/asm-x86/bitops.h @@ -334,7 +334,7 @@ extern unsigned int __find_first_zero_bit( extern unsigned int __find_next_zero_bit( const unsigned long *addr, unsigned int size, unsigned int offset); -static inline unsigned int __scanbit(unsigned long val, unsigned int max) +static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) { if ( __builtin_constant_p(max) && max == BITS_PER_LONG ) alternative_io("bsf %[in],%[out]; cmovz %[max],%k[out]", -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel