On 14/03/2024 2:16 pm, Jan Beulich wrote: > On 13.03.2024 18:27, Andrew Cooper wrote: >> --- a/xen/arch/arm/include/asm/bitops.h >> +++ b/xen/arch/arm/include/asm/bitops.h >> @@ -157,7 +157,7 @@ static inline int fls(unsigned int x) >> } >> >> >> -#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); }) >> +#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); }) > The way the macro is invoked, I don't think the helper local variable > is then needed anymore?
I strongly suspect It is still needed. ISOLATE_LSB() double-expands its parameter. Either way, I'm not reopening that can of worms that lead to this form. >> --- a/xen/arch/x86/include/asm/bitops.h >> +++ b/xen/arch/x86/include/asm/bitops.h >> @@ -430,16 +430,23 @@ static inline int ffsl(unsigned long x) >> return (int)r+1; >> } >> >> -static inline int ffs(unsigned int x) >> +static inline unsigned int arch_ffs(unsigned int x) >> { >> - int r; >> + int r = -1; >> + >> + /* >> + * The AMD manual states that BSF won't modify the destination register >> if >> + * x=0. The Intel manual states that the result is undefined, but the >> + * architects have said that the register is written back with it's old >> + * value, possibly zero extended above 32 bits. >> + */ >> + asm ( "bsf %[val], %[res]" >> + : [res] "+r" (r) >> + : [val] "rm" (x) ); > And this isn't what the compiler would be doing anyway? No. The builtin avoids all undefined behaviour, and is quite a lot of asm as a result. With some help from the gcc mailing list https://gcc.gnu.org/pipermail/gcc/2024-March/243465.html I've found a solution which improves things in the common case. > Also, just to mention it: I take it that you/we are sure that disallowing > both operands to be the same register is still better than ... > >> - asm ( "bsf %1,%0\n\t" >> - "jnz 1f\n\t" >> - "mov $-1,%0\n" >> - "1:" : "=r" (r) : "rm" (x)); > ... the original form? Yes. Without any doubt, on a 64bit CPU. This transformation isn't safe on a 486, but I expect even the later 32bit CPUs lacking register renaming would still be better with the non-branch form. >> --- a/xen/include/xen/bitops.h >> +++ b/xen/include/xen/bitops.h >> @@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x) >> >> #include <asm/bitops.h> >> >> +/* >> + * Find First Set bit. Bits are labelled from 1. >> + */ >> +static always_inline __pure unsigned int ffs(unsigned int x) > Why always_inline? For all the normal reasons to counter Clang and GCC doing stupid things with inlines that contain assembly. > >> +{ >> + if ( __builtin_constant_p(x) ) >> + return __builtin_ffs(x); >> + >> +#ifndef arch_ffs >> +#define arch_ffs __builtin_ffs >> +#endif >> + >> + return arch_ffs(x); >> +} > Just to mention it: __builtin_ffs() takes and returns plain int. I'm > happy about our own helper being unsigned-correct, but anything like > this has a Misra angle too. I did note that, and decided it could wait until some other point. ~Andrew