On 14/03/2024 2:16 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/arch/arm/include/asm/bitops.h
>> +++ b/xen/arch/arm/include/asm/bitops.h
>> @@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
>>  }
>>  
>>  
>> -#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
>> +#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
> The way the macro is invoked, I don't think the helper local variable
> is then needed anymore?

I strongly suspect It is still needed.  ISOLATE_LSB() double-expands its
parameter.

Either way, I'm not reopening that can of worms that lead to this form.
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -430,16 +430,23 @@ static inline int ffsl(unsigned long x)
>>      return (int)r+1;
>>  }
>>  
>> -static inline int ffs(unsigned int x)
>> +static inline unsigned int arch_ffs(unsigned int x)
>>  {
>> -    int r;
>> +    int r = -1;
>> +
>> +    /*
>> +     * The AMD manual states that BSF won't modify the destination register 
>> if
>> +     * x=0.  The Intel manual states that the result is undefined, but the
>> +     * architects have said that the register is written back with it's old
>> +     * value, possibly zero extended above 32 bits.
>> +     */
>> +    asm ( "bsf %[val], %[res]"
>> +          : [res] "+r" (r)
>> +          : [val] "rm" (x) );
> And this isn't what the compiler would be doing anyway?

No.  The builtin avoids all undefined behaviour, and is quite a lot of
asm as a result.

With some help from the gcc mailing list
https://gcc.gnu.org/pipermail/gcc/2024-March/243465.html I've found a
solution which improves things in the common case.

> Also, just to mention it: I take it that you/we are sure that disallowing
> both operands to be the same register is still better than ...
>
>> -    asm ( "bsf %1,%0\n\t"
>> -          "jnz 1f\n\t"
>> -          "mov $-1,%0\n"
>> -          "1:" : "=r" (r) : "rm" (x));
> ... the original form?

Yes.  Without any doubt, on a 64bit CPU.

This transformation isn't safe on a 486, but I expect even the later
32bit CPUs lacking register renaming would still be better with the
non-branch form.


>> --- a/xen/include/xen/bitops.h
>> +++ b/xen/include/xen/bitops.h
>> @@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x)
>>  
>>  #include <asm/bitops.h>
>>  
>> +/*
>> + * Find First Set bit.  Bits are labelled from 1.
>> + */
>> +static always_inline __pure unsigned int ffs(unsigned int x)
> Why always_inline?

For all the normal reasons to counter Clang and GCC doing stupid things
with inlines that contain assembly.

>
>> +{
>> +    if ( __builtin_constant_p(x) )
>> +        return __builtin_ffs(x);
>> +
>> +#ifndef arch_ffs
>> +#define arch_ffs __builtin_ffs
>> +#endif
>> +
>> +    return arch_ffs(x);
>> +}
> Just to mention it: __builtin_ffs() takes and returns plain int. I'm
> happy about our own helper being unsigned-correct, but anything like
> this has a Misra angle too.

I did note that, and decided it could wait until some other point.

~Andrew

Reply via email to