On 14/03/2024 5:14 pm, Andrew Cooper wrote: > On 14/03/2024 3:59 pm, Jan Beulich wrote: >> On 13.03.2024 18:27, Andrew Cooper wrote: >>> --- a/xen/arch/x86/include/asm/bitops.h >>> +++ b/xen/arch/x86/include/asm/bitops.h >>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned >>> long val, unsigned int max) >>> r__; >>> \ >>> }) >>> >>> -/** >>> - * find_first_set_bit - find the first set bit in @word >>> - * @word: the word to search >>> - * >>> - * Returns the bit-number of the first set bit. The input must *not* be >>> zero. >>> - */ >>> -static inline unsigned int find_first_set_bit(unsigned long word) >>> -{ >>> - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); >>> - return (unsigned int)word; >>> -} >> And you think it's okay to no longer use TZCNT like this when available, >> where the output doesn't have to have its value set up front? > This is a particularly evil piece of inline asm. > > It is interpreted as BSF or TZCNT depending on the BMI instruction set > (Haswell/Piledriver era). Furthermore there are errata on some Intel > systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated. > > Which means this piece of asm suffers from all of an undefined output > register, undefined CF behaviour, and differing ZF behaviour (I believe) > depending on which hardware you're running on. > > The only thing the REP prefix is getting you is a deterministic 0 in the > destination register,
No, it doesn't. For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64 in this case. It also means there's no chance of coming up with a useful alternative for ffs() to use TZCNT when available. ~Andrew