On Thu, Dec 31, 2015 at 8:46 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Thu, Dec 31, 2015 at 11:39 AM, Ganesh Ajjanagadde > <gajjanaga...@gmail.com> wrote: >> >> This patch does not seem to have measurable impact, at least on x86-64, >> though >> there could be benefits for less than stellar branch predictors. > > [..] >> >> - for (i = 0; i < 1<<13; i++) { >> - if (!(i & 7)) >> - cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f; >> - else >> - cbrt_tab[i].f = i * cbrt(i); >> + for (i = 0; i < 1<<13; i+=8) { >> + cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f; >> + cbrt_tab[i+1].f = (i+1) * cbrt(i+1); >> + cbrt_tab[i+2].f = (i+2) * cbrt(i+2); >> + cbrt_tab[i+3].f = (i+3) * cbrt(i+3); >> + cbrt_tab[i+4].f = (i+4) * cbrt(i+4); >> + cbrt_tab[i+5].f = (i+5) * cbrt(i+5); >> + cbrt_tab[i+6].f = (i+6) * cbrt(i+6); >> + cbrt_tab[i+7].f = (i+7) * cbrt(i+7); > > > gcc (and most other compilers) will unroll the loop automatically, I > suspect. Check disassembly to confirm?
checked, it does not on gcc at least. I would in fact suspect the opposite; this increases the binary size non-negligibly (unlike an unrolling by two for instance), and is slightly nontrivial for a compiler due to the i&7 business. > > (That doesn't mean the patch shouldn't go in, I'm just trying to help you > explain the result. I have no comment on the patch itself.) I think the branch predictor explanation is reasonable, these are very predictable due to the relatively long length of the loop, and simple, periodic branches. On the other hand, this varies considerably across even generations of intel CPU's; hence I posted the patch. Thanks for the idea though. > > Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel