On 05/25/2013 10:27 PM, Lee Thomas wrote: > Hello FreeBSD devs, > > I have found a performance improvement to libc's strnlen(). > lib/libc/string/strnlen.c is a trivial byte-by-byte implementation, > where strlen.c has a smarter word-by-word implementation. I have > implemented strnlen similarly to strlen. It runs about 4x as fast, at > the cost of a binary codesize increase from 30 bytes to 221. > > In writing this I needed a test, and there isn't one in > tools/regression/lib/libc/string, so I wrote a unit test for strnlen. > This really only makes sense for a word-by-word implementation, as it > tests each combination of string and limit length, overread characters, > and starting alignment. > > Could someone please review these patches? I am not very experienced in > C, and am even less experienced with FreeBSD's style guidelines, so they > likely have a bunch of style and idiom issues. Even if one or both of > them prove not worth committing, it would still be useful for my learning. > > If/When these patches prove worthy of submitting, what is the next step > after that? Should I submit a PR, or is there some other process? This > code is quite similar to the existing strlen.c, and doesn't do anything > fancy with e.g. endianness, but I haven't tested it for correctness or > performance on anything other than x86... > > And finally, there is some other low-hanging fruit in the other strn* > functions. Would it be worth it for me to give those the same treatment? > > Thanks, > Lee Thomas > > Test platform: > $uname -a > FreeBSD LeeDesktop 9.1-STABLE FreeBSD 9.1-STABLE #15 r250831: > Mon May 20 17:31:28 EDT 2013 > lthomas@LeeDesktop:/usr/obj/usr/src/sys/NOSOUND amd64 > $dmesg | grep CPU: > CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz > (2666.81-MHz K8-class CPU) > $clang --version > FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 20121221 > Target: x86_64-unknown-freebsd9.1 > Thread model: posix > > My timing test was a file of 10000 strings, of uniformly random length, > 50% between 0 and 20 bytes long, and 50% between 21 and 1000 bytes long. > Each was filled with random generated bytes in the range [1, 255]. The > test executables run their strlen on each string in the file 1000 times, > and a shell script ran the test programs alternately 100 times. Here are > the clang results; gcc results are roughly the same. I will share the > actual timing code if someone wants it, but it is pretty ugly C++ and > shell and I don't guarantee it working :-). > > Results: > > x ./times_bsd_strnlen.txt > + ./times_my_strnlen.txt > +--------------------------------------------------------------------------+ > > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |+ x| > |A A| > +--------------------------------------------------------------------------+ > > N Min Max Median Avg Stddev > x 101 1.8685486 1.8693889 1.8687506 1.8687894 0.0001484903 > + 101 0.42704683 0.42779486 0.42712813 0.42715597 0.00010681494 > Difference at 95.0% confidence > -1.44163 +/- 3.56739e-05 > -77.1426% +/- 0.00190893% > (Student's t, pooled s = 0.000129342) > > Note: I manually shortened the histogram, as it was 100 lines long of > exactly the same. > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" >
> Index: strnlen.c > =================================================================== > diff --git a/head/lib/libc/string/strnlen.c b/head/lib/libc/string/strnlen.c > --- a/head/lib/libc/string/strnlen.c (revision 250951) > +++ b/head/lib/libc/string/strnlen.c (working copy) > @@ -1,5 +1,6 @@ > /*- > - * Copyright (c) 2009 David Schultz <d...@freebsd.org> > + * Copyright (c) 2009, 2010 Xin LI <delp...@freebsd.org> > + * Copyright (c) 2013 Lee Thomas <lee_tho...@aslantools.com> > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -27,16 +28,91 @@ > #include <sys/cdefs.h> > __FBSDID("$FreeBSD$"); > > +#include <sys/limits.h> > +#include <sys/types.h> > #include <string.h> > > +/* > + * Portable strnlen() for 32-bit and 64-bit systems. > + * > + * Rationale: it is generally much more efficient to do word length > + * operations and avoid branches on modern computer systems, as > + * compared to byte-length operations with a lot of branches. > + * > + * The expression: > + * > + * ((x - 0x01....01) & ~x & 0x80....80) > + * > + * would evaluate to a non-zero value iff any of the bytes in the > + * original word is zero. > + * > + * On multi-issue processors, we can divide the above expression into: > + * a) (x - 0x01....01) > + * b) (~x & 0x80....80) > + * c) a & b > + * > + * Where, a) and b) can be partially computed in parallel. > + * > + * The algorithm above is found on "Hacker's Delight" by > + * Henry S. Warren, Jr. > + */ > + > +/* Magic numbers for the algorithm */ > +#if LONG_BIT == 32 > +static const unsigned long mask01 = 0x01010101; > +static const unsigned long mask80 = 0x80808080; > +#elif LONG_BIT == 64 > +static const unsigned long mask01 = 0x0101010101010101; > +static const unsigned long mask80 = 0x8080808080808080; > +#else > +#error Unsupported word size > +#endif > + > +#define LONGPTR_MASK (sizeof(long) - 1) > + > size_t > -strnlen(const char *s, size_t maxlen) > +strnlen(const char *str, size_t maxlen) > { > - size_t len; > + const char *stop, *short_stop; > + const char *p; > + const unsigned long *lp; > + long va, vb; > > - for (len = 0; len < maxlen; len++, s++) { > - if (!*s) > - break; > + if (maxlen==0) return 0; > + > + stop=str+maxlen; > + /* > + * Before trying the hard (unaligned byte-by-byte access) way > + * to figure out whether there is a nul character, try to see > + * if there is a nul character is within this accessible word > + * first. > + * > + * p and (p & ~LONGPTR_MASK) must be equally accessible since > + * they always fall in the same memory page, as long as page > + * boundaries is integral multiple of word size. > + */ > + lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK); > + va = (*lp - mask01); > + vb = ((~*lp) & mask80); I do not think that this correct C. This is type punning violating the rules of the language. One way you can rewrite this is to use memcpy() into a temporary. Do not be afraid of the memcpy(), modern GCC, and I hope even Clang, can optimize it into a single move instruction: { long tmp; memcpy(&tmp, lp, sizeof(tmp)); va = (tmp - mask01); vb = ((~tmp) & mask80); } Another way could be using the union trick. > + lp++; > + if (va & vb) { > + /* Check if we have \0 in the first part */ > + short_stop=(const char *)lp; > + if (stop<short_stop) short_stop=stop; > + for (p=str; p != short_stop; p++) > + if (*p == '\0') > + return (p-str); > } > - return (len); > + /* Scan the rest of the string using word sized operation */ > + for (; (const char *)lp < stop; lp++) { > + va = (*lp - mask01); > + vb = ((~*lp) & mask80); Ditto. > + if (va & vb) { > + for (p=(const char *)lp; p != stop; p++) > + if (*p == '\0') > + break; > + return (p-str); > + } > + } > + return (maxlen); > } -- VZ
signature.asc
Description: OpenPGP digital signature