On Thu, Dec 31, 2015 at 3:53 PM, Ganesh Ajjanagadde <gajjanaga...@gmail.com> wrote: > On Thu, Dec 31, 2015 at 8:46 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: >> Hi, >> >> On Thu, Dec 31, 2015 at 11:39 AM, Ganesh Ajjanagadde >> <gajjanaga...@gmail.com> wrote: >>> >>> This patch does not seem to have measurable impact, at least on x86-64, >>> though >>> there could be benefits for less than stellar branch predictors. >> >> [..] >>> >>> - for (i = 0; i < 1<<13; i++) { >>> - if (!(i & 7)) >>> - cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f; >>> - else >>> - cbrt_tab[i].f = i * cbrt(i); >>> + for (i = 0; i < 1<<13; i+=8) { >>> + cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f; >>> + cbrt_tab[i+1].f = (i+1) * cbrt(i+1); >>> + cbrt_tab[i+2].f = (i+2) * cbrt(i+2); >>> + cbrt_tab[i+3].f = (i+3) * cbrt(i+3); >>> + cbrt_tab[i+4].f = (i+4) * cbrt(i+4); >>> + cbrt_tab[i+5].f = (i+5) * cbrt(i+5); >>> + cbrt_tab[i+6].f = (i+6) * cbrt(i+6); >>> + cbrt_tab[i+7].f = (i+7) * cbrt(i+7); >> >> >> gcc (and most other compilers) will unroll the loop automatically, I >> suspect. Check disassembly to confirm? >> >> (That doesn't mean the patch shouldn't go in, I'm just trying to help you >> explain the result. I have no comment on the patch itself.) > > Patch series dropped, I have superior approach that brings down to ~ > 400k cycles (as opposed to original 750k, proposed 660k). Currently at > work seeing if there is anything I can easily squeeze further.
Sorry, actually 300k cycles. [...] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel