On Tue, Feb 3, 2015 at 10:17 AM, Davidlohr Bueso <d...@stgolabs.net> wrote: > On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote: >> Hmm. I don't disagree, but would like some more feedback. >> >> Davidlohr - you were the person to touch this function last (commit >> 30493cc9dddb: "lib/int_sqrt.c: optimize square root algorithm"), and >> you did so for performance reasons. And in fact, when you did that, >> you removed that initial loop: >> >> - one = 1UL << (BITS_PER_LONG - 2); >> - while (one > op) >> - one >>= 2; >> >> but I'm not sure that was actually all that conscious, I think the >> real optimization was the changes inside the loop to make the final >> real loop faster and simpler. > > I missed that. And yes, the real optimization should be in the loop. > >> >> Also, you had performance numbers, so presumably a test harness for it >> all. It probably depends a lot on the actual distribution of argument >> values, of course, but it would be good to accompany the patch with >> actual real numbers like lasty time. > > Aha. In my case I recall I ran a usersapce program using each function > from 1 to a million, and throwing perf at it for 10 times. > I have done profiling of int_sqrt function using perf tool for 10 times. For this purpose i have created a userspace program which uses sqrt function from 1 to a million.
int_sqrt_old -> current algorithm version int_sqrt_new -> with proposed change these results are for BITS_PER_LONG=64. Performance counter stats for './int_sqrt_old' (10 runs): 460.944061 task-clock (msec) # 0.969 CPUs utilized ( +- 1.72% ) 64 context-switches # 0.139 K/sec ( +- 2.27% ) 0 cpu-migrations # 0.000 K/sec 132 page-faults # 0.286 K/sec <not supported> cycles <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend <not supported> instructions <not supported> branches <not supported> branch-misses 0.475795341 seconds time elapsed( +- 3.20% ) Performance counter stats for './int_sqrt_new' (10 runs): 401.782119 task-clock (msec) # 0.974 CPUs utilized( +- 1.55% ) 57 context-switches # 0.141 K/sec( +- 1.92% ) 0 cpu-migrations # 0.000 K/sec 132 page-faults # 0.329 K/sec <not supported> cycles <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend <not supported> instructions <not supported> branches <not supported> branch-misses 0.412593296 seconds time elapsed( +- 2.03% ) As per profiling definitely there is improvement in algorithm timing. >> (I'm also not entirely sure what uses int_sqrt() that ends up being so >> performance-critical, so it would be good to document that too, since >> that probably also matters for the "what's the normal argument range" >> question..) > > It's not a big deal afaik. > > Thanks, > Davidlohr > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/