Hi, A quick benchmark shows it's faster up to about 10 bytes, but after that it becomes extremely slow. At 16 bytes it's already 2.5 times slower and for larger sizes its over 13 times slower than the GLIBC implementation...
> The implementation falls back to the library call if the > string is not aligned. If it did that for larger sizes then it would be fine. However a byte loop is is unacceptably slow. Also given the large amount of inlined code, it would make sense to handle larger sizes than 8. It may be worth comparing a loop doing 8 bytes per iteration with the GLIBC strlen or just inline the first 16 bytes and then fallback to strlen. Also if you have statistics that show tiny strlen sizes are much more common then the strlen implementation could be further tuned for that. Wilco