On Sun, Oct 11, 2015 at 01:14:27AM +0200, Michael Niedermayer wrote: > On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote: > > On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde > > <[email protected]> wrote: > > > This uses Stein's binary GCD algorithm: > > > https://en.wikipedia.org/wiki/Binary_GCD_algorithm > > > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard > > > architectures. > > > Have not benchmarked, so I can't comment > > > > Before you submit a patch that's supposed to make something faster, > > you should benchmark it to verify that it is in fact faster. Do this > > with inputs of various sizes on both 32- and 64-bit architectures and > > both with and without compilers that support __builtin_ctzll(v). > > without __builtin_ctzll() the old code seems faster in a simple test > like: > make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov > > also it seems a simple > while (u != v) { > if (u > v) > FFSWAP(int64_t, v, u); > v -= u; > do { > v >>= 1; > } while(!(v & 1)); > } > is faster than the non builtin ctzll in its place but still not as > fast as the current code
heres a ff_ctzll_c with LUTs, if someone wants to play more with
it (its almost as fast as the old code when theres no __builtin_ctzll())
for the matrixbench_mpeg2 -> mov testcase
static av_always_inline int ff_ctzll_c(long long v)
+{
+ int c;
+ static const uint8_t lut[256] =
+ {
+ 8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ };
+ c = lut[v&255];
+
+ if (c<8)
+ return c;
+
+ c = 0;
+ if (!(v & 0xffffffff)) {
+ v >>= 32;
+ c += 32;
+ }
+ if (!(v & 0xffff)) {
+ v >>= 16;
+ c += 16;
+ }
+ if (!(v & 0xff)) {
+ v >>= 8;
+ c += 8;
+ }
+ return c + lut[v&255];
+}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
