On Tue, Oct 20, 2015 at 12:52 AM, James Almer <jamr...@gmail.com> wrote: > On 10/11/2015 12:45 AM, Michael Niedermayer wrote: >> On Sat, Oct 10, 2015 at 09:58:47PM -0400, Ganesh Ajjanagadde wrote: >>> This uses Stein's binary GCD algorithm: >>> https://en.wikipedia.org/wiki/Binary_GCD_algorithm >>> to get a roughly 4x speedup over Euclidean GCD on standard architectures >>> with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise. >>> At the moment, the compiler intrinsic is used on GCC and Clang due to >>> its easy availability. >>> >>> Quick note regarding overflow: yes, subtractions on int64_t can, but the >>> llabs takes care of that. The llabs is also guaranteed to be safe, with >>> no annoying INT64_MIN business since INT64_MIN being a power of 2, is >>> shifted down before being sent to llabs. >>> >>> The binary GCD needs ff_ctzll, an extension of ff_ctz for long long >>> (int64_t). On >>> GCC, this is provided by a built-in. On Microsoft, there is a >>> BitScanForward64 analog of BitScanForward that should work; but I can't >>> confirm. >>> Apparently it is not available on 32 bit builds; so this may or may not >>> work correctly. On Intel, per the documentation there is only an >>> intrinsic for _bit_scan_forward and people have posted on forums >>> regarding _bit_scan_forward64, but often their documentation is >>> woeful. Again, I don't have it, so I can't test. >>> >>> As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest >>> use a compiled version based on the De-Bruijn method of Leiserson et al: >>> http://supertech.csail.mit.edu/papers/debruijn.pdf. >>> >>> Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell) >>> with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a >>> make fate. >>> >>> aac-am00_88.err: >>> builtin: >>> 714 decicycles in av_gcd, 4095 runs, 1 skips >>> >>> de-bruijn: >>> 1440 decicycles in av_gcd, 4096 runs, 0 skips >>> >>> previous: >>> 2889 decicycles in av_gcd, 4096 runs, 0 skips >>> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>> --- >>> libavutil/intmath.h | 19 +++++++++++++++++++ >>> libavutil/mathematics.c | 26 +++++++++++++++++++++----- >>> 2 files changed, 40 insertions(+), 5 deletions(-) >> >> applied >> >> thanks > > This broke fate with -ftrapv > http://fate.ffmpeg.org/history.cgi?slot=x86_64-freebsd10-clang33-ftrapv > http://fate.ffmpeg.org/history.cgi?slot=x86_64-openbsd5.6-gcc4.2-ftrapv > I can reproduce this with GCC 5 on x86_64 ArchLinux as well. > > The problem seems to be in av_gcd() and not the De-Bruijn version of > ff_ctzll since the gcc builtin is being used.
Don't have the time to investigate right now - revert for now unless someone can fix it quickly. > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel