On Sat, 11 May 2019, Doug Moore wrote:

Log:
 When bitpos can't be implemented with an inline ffs* instruction,
 change the binary search so that it does not depend on a single bit
 only being set in the bitmask. Use bitpos more generally, and avoid
 some clearing of bits to accommodate its current behavior.

Inline ffs*() is badly implemented in general (it is mostly not implemented
in the kernel).  This makes it almost useless here, and (non-inline) ffs*()
(not used here) often slower than direct methods.

Modified: head/sys/kern/subr_blist.c
==============================================================================
--- head/sys/kern/subr_blist.c  Sat May 11 04:18:06 2019        (r347483)
+++ head/sys/kern/subr_blist.c  Sat May 11 09:09:10 2019        (r347484)
...
+static inline int
+bitpos(u_daddr_t mask)
+{
+
        switch (sizeof(mask)) {
#ifdef HAVE_INLINE_FFSLL
        case sizeof(long long):
                return (ffsll(mask) - 1);
#endif

This uses the long long abomination.  Only amd64 has inline ffsll().

+#ifdef HAVE_INLINE_FFS
+       case sizeof(int):
+               return (ffs(mask) - 1);
+#endif

This is unreachable, since sizeof(int) is 4 on all supported arches, and
sizeof(mask) is 8 on all arches.

        default:
-               lo = 0;
-               hi = BLIST_BMAP_RADIX;
-               while (lo + 1 < hi) {
-                       mid = (lo + hi) >> 1;
-                       if ((mask >> mid) != 0)
-                               lo = mid;
-                       else
-                               hi = mid;
-               }
-               return (lo);
+               return (generic_bitpos(mask));
        }
}

So the default cause is used except on amd64.  I think the direct code
for it is not as slow as the generic (libkern) ffs*(), except the generic
ffs*() is simpler so the compiler might inline it internals to
__builtin_ffs*().  Not that -ffreestanding prevents inlining extern ffs*().

Since 64/8 is 2 times 4 and sizeof(int) >= 4 in POSIX, it is trivial to
implement ffs64() using 2 inline ffs()'s, and that is exactly what clang
does on i386 to implement __builtin_ffsl().  Unfortunately, most arches
also don't implement inline ffs().  Only amd64, i386 and sparc64 implement
it.

subr_blist.c is also complilable outside of the kernel.  I don't like that.
The kernel macros HAVE_INLINE_* don't exist outside of the kernel, but
ffs*() would work better outside of the kernel if it is used blindly since
inlining of it isn't prevented by -ffreestanding.

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to