On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote: > On gcc, the asm versions of `fls` are about the same speed as the > builtin. On clang, the versions that use fls (fls,fls64) are more than > twice as slow as the builtin. This is because the way the `fls` function > is written, clang puts the value in memory: > https://godbolt.org/z/EfMbYe. This can be fixed in a separate patch.
Is this because clang gets the asm constraints wrong? ISTR that happening before, surely the right thing is to fix clang?