On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote: > How did this two-year old thread get resurrected?
I was looking for the original thread doing that 'optimization' Davidlohr did but found this first. > And the *most* important question is that first one: > > "Why does this matter, and what is the range it matters for?" I was looking to do some work on the idle estimator. Parts of that keep online avg and variance for normal distributions. I wanted to bias the avg downwards, the way to do that is to subtract a scaled stdev from it. Computing the stdev from a variance requires the sqrt. Thomas rightly asked how expensive our sqrt is, I found Davidlohr's commit message and got confused by the numbers, so I reran them and found the optimization did the reverse, it made things worse. By then I was prodding at it... 'fun' problem :-) In any case, I suppose the range of values would be in the order of TICK_NSEC, so the variance would be a number of orders below that. So we're looking at fairly small numbers <1e5. > Also, since this is a generic library routine, no way can we depend on > fls being fast. Which is why I also tested the software fls, but you're right in that the microbench primes the branch predictor. Still, the software fls is 6 branches, whereas the 'missing' loop: while (m > x) m >>= 2; would need up to 30 or so cycles worst case. So even in that respect it makes sense its a 'win', esp. so for small numbers.