On Tue, Dec 8, 2015 at 1:49 PM, Michael Niedermayer <michae...@gmx.at> wrote: > On Mon, Dec 07, 2015 at 09:50:49PM -0500, Ganesh Ajjanagadde wrote: >> pow is a very wasteful function for this purpose. A low hanging fruit >> would be simply to replace with exp2f, and that does yield some speedup. >> However, there are 2 drawbacks of this: >> 1. It does not exploit the integer nature of the argument. >> 2. (minor) Some platforms lack a proper exp2f routine, making benefits >> available >> only to non broken libm. >> 3. exp2f does not solve the same issue that plagues pow, namely terrible >> worst case performance. This is a fundamental issue known as the >> "table-maker's dilemma" recognized by Prof. Kahan himself and >> subsequently elaborated and researched by many others. All this is clear >> from benchmarks below. >> >> This exploits the IEEE-754 format to get very good performance even in >> the worst case for integer powers of 2. This solves all the issues noted >> above. Function tested with clang usan over [-1000, 1000] (beyond range of >> relevance for this, which is [-255, 255]), patch itself with FATE. >> >> Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of >> the pow call, START/STOP, and command ffplay >> ~/samples/jpeg2000/chiens_dcinema2K.mxf. >> Low number of runs also given to prove the point about worst case: >> >> pow: >> 216270 decicycles in pow, 1 runs, 0 skips >> 110175 decicycles in pow, 2 runs, 0 skips >> 56085 decicycles in pow, 4 runs, 0 skips >> 29013 decicycles in pow, 8 runs, 0 skips >> 15472 decicycles in pow, 16 runs, 0 skips >> 8689 decicycles in pow, 32 runs, 0 skips >> 5295 decicycles in pow, 64 runs, 0 skips >> 3599 decicycles in pow, 128 runs, 0 skips >> 2748 decicycles in pow, 256 runs, 0 skips >> 2304 decicycles in pow, 511 runs, 1 skips >> 2072 decicycles in pow, 1022 runs, 2 skips >> 1963 decicycles in pow, 2044 runs, 4 skips >> 1894 decicycles in pow, 4091 runs, 5 skips >> 1860 decicycles in pow, 8184 runs, 8 skips >> >> exp2f: >> 134140 decicycles in pow, 1 runs, 0 skips >> 68110 decicycles in pow, 2 runs, 0 skips >> 34530 decicycles in pow, 4 runs, 0 skips >> 17677 decicycles in pow, 8 runs, 0 skips >> 9175 decicycles in pow, 16 runs, 0 skips >> 4931 decicycles in pow, 32 runs, 0 skips >> 2808 decicycles in pow, 64 runs, 0 skips >> 1747 decicycles in pow, 128 runs, 0 skips >> 1208 decicycles in pow, 256 runs, 0 skips >> 952 decicycles in pow, 512 runs, 0 skips >> 822 decicycles in pow, 1024 runs, 0 skips >> 765 decicycles in pow, 2047 runs, 1 skips >> 722 decicycles in pow, 4094 runs, 2 skips >> 693 decicycles in pow, 8190 runs, 2 skips >> >> exp2fi: >> 2740 decicycles in pow, 1 runs, 0 skips >> 1530 decicycles in pow, 2 runs, 0 skips >> 955 decicycles in pow, 4 runs, 0 skips >> 622 decicycles in pow, 8 runs, 0 skips >> 477 decicycles in pow, 16 runs, 0 skips >> 368 decicycles in pow, 32 runs, 0 skips >> 317 decicycles in pow, 64 runs, 0 skips >> 291 decicycles in pow, 128 runs, 0 skips >> 277 decicycles in pow, 256 runs, 0 skips >> 268 decicycles in pow, 512 runs, 0 skips >> 265 decicycles in pow, 1024 runs, 0 skips >> 263 decicycles in pow, 2048 runs, 0 skips >> 263 decicycles in pow, 4095 runs, 1 skips >> 260 decicycles in pow, 8191 runs, 1 skips >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > > probably ok > > thx
pushed, thanks > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Asymptotically faster algorithms should always be preferred if you have > asymptotical amounts of data > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel