Hi, On Sat, Oct 10, 2015 at 6:07 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> On Sat, Oct 10, 2015 at 5:45 PM, Ganesh Ajjanagadde <gajja...@mit.edu> > wrote: > > On Sat, Oct 10, 2015 at 5:32 PM, Henrik Gramner <hen...@gramner.com> > wrote: > >> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde > >> <gajjanaga...@gmail.com> wrote: > >>> This uses Stein's binary GCD algorithm: > >>> https://en.wikipedia.org/wiki/Binary_GCD_algorithm > >>> to get a reported 1.7-2.1x speedup over Euclidean GCD on standard > architectures. > >>> Have not benchmarked, so I can't comment > >> > >> Before you submit a patch that's supposed to make something faster, > >> you should benchmark it to verify that it is in fact faster. Do this > >> with inputs of various sizes on both 32- and 64-bit architectures and > >> both with and without compilers that support __builtin_ctzll(v). > > > > I did not do this because I don't know how to write a benchmark in C > > easily: for instance, printing out can lead to I/O time loss, and on > > O3, unless there is I/O, a function may not be evaluated at all due to > > the "as if" optimization rules. Another problem is that the function > > is small, so I can't just time a single call, there are cache issues > > with arrays, etc etc. Yes, I can do it - but it will take some effort. > > > > Also, the only compilers I have are clang and gcc, both of which have > > the builtin. I figured someone with a better array of tools and more > > knowledgable in writing benchmarks can trivially do this. Furthermore, > > at the moment, comparisons are incomplete: I don't care about Windows > > myself but one could trivially add the lines if one so desired (see my > > commit message). > > > > Lastly, I can't find any links on how to do benchmarks cleanly, > > efficiently, and accurately. I hoped that FFmpeg has some clean way of > > doing this, but all the methods I see are terribly ad-hoc. > > So I hacked up a quick and dirty benchmark. I fill up two input arrays > with 10^8 entries each via random() calls. I then start a timer, and > compute using the old method the gcd, inputs drawn from the arrays. > Then I used the new method, repeating the above. > Results (on my Intel Haswell, GCC 5.2): > Old method: 22.755636 > New method: 6.495941 Search for usages of START_TIMER and STOP_TIMER, it's how we typically measure these things :). Nice numbers! Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel