On Sun, Oct 11, 2015 at 01:14:27AM +0200, Michael Niedermayer wrote: > On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote: > > On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde > > <gajjanaga...@gmail.com> wrote: > > > This uses Stein's binary GCD algorithm: > > > https://en.wikipedia.org/wiki/Binary_GCD_algorithm > > > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard > > > architectures. > > > Have not benchmarked, so I can't comment > > > > Before you submit a patch that's supposed to make something faster, > > you should benchmark it to verify that it is in fact faster. Do this > > with inputs of various sizes on both 32- and 64-bit architectures and > > both with and without compilers that support __builtin_ctzll(v). > > without __builtin_ctzll() the old code seems faster in a simple test > like: > make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov > > also it seems a simple > while (u != v) { > if (u > v) > FFSWAP(int64_t, v, u); > v -= u; > do { > v >>= 1; > } while(!(v & 1)); > } > is faster than the non builtin ctzll in its place but still not as > fast as the current code
heres a ff_ctzll_c with LUTs, if someone wants to play more with it (its almost as fast as the old code when theres no __builtin_ctzll()) for the matrixbench_mpeg2 -> mov testcase static av_always_inline int ff_ctzll_c(long long v) +{ + int c; + static const uint8_t lut[256] = + { + 8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, + }; + c = lut[v&255]; + + if (c<8) + return c; + + c = 0; + if (!(v & 0xffffffff)) { + v >>= 32; + c += 32; + } + if (!(v & 0xffff)) { + v >>= 16; + c += 16; + } + if (!(v & 0xff)) { + v >>= 8; + c += 8; + } + return c + lut[v&255]; +} [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel