On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote: > On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde > <gajjanaga...@gmail.com> wrote: > > This uses Stein's binary GCD algorithm: > > https://en.wikipedia.org/wiki/Binary_GCD_algorithm > > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard > > architectures. > > Have not benchmarked, so I can't comment > > Before you submit a patch that's supposed to make something faster, > you should benchmark it to verify that it is in fact faster. Do this > with inputs of various sizes on both 32- and 64-bit architectures and > both with and without compilers that support __builtin_ctzll(v).
a quick benchmark with --- a/tests/fate-run.sh +++ b/tests/fate-run.sh @@ -296,7 +296,7 @@ if test $err != 0 && test $gen != "no" ; then fi if test $err = 0; then - rm -f $outfile $errfile $cmpfile $cleanfiles + rm -f $outfile $cmpfile $cleanfiles and START/STOP_TIMER and running make fate -j12 -k shows that the new code appears significantly faster (x86-64 sandybridge + gcc) old: aac-al18_44.err: 1748 decicycles in gcd, 4096 runs, 0 skips aac-am00_88.err: 2496 decicycles in gcd, 4096 runs, 0 skips aac-am00_88.err: 2498 decicycles in gcd, 8192 runs, 0 skips aac-er_eld1001np_44_ep0.err: 2433 decicycles in gcd, 1023 runs, 1 skips aac-er_eld1001np_44_ep0.err: 2467 decicycles in gcd, 2047 runs, 1 skips aac-er_eld1001np_44_ep0.err: 2352 decicycles in gcd, 4095 runs, 1 skips aac-er_eld2000np_48_ep0.err: 1737 decicycles in gcd, 4096 runs, 0 skips aac-er_eld2100np_48_ep0.err: 1052 decicycles in gcd, 4096 runs, 0 skips aac-fixed-al18_44.err: 1764 decicycles in gcd, 4096 runs, 0 skips aac-fixed-er_eld1001np_44_ep0.err: 2484 decicycles in gcd, 4096 runs, 0 skips aac-fixed-er_eld2000np_48_ep0.err: 1749 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-adx.err: 1411 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-adx.err: 1392 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-adx.err: 1403 decicycles in gcd, 16384 runs, 0 skips acodec-adpcm-adx-trellis.err: 1414 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-adx-trellis.err: 1416 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-adx-trellis.err: 1417 decicycles in gcd, 16384 runs, 0 skips acodec-adpcm-ima_qt.err: 1399 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-ima_qt.err: 1406 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-ima_qt-trellis.err: 1487 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-ima_qt-trellis.err: 1489 decicycles in gcd, 8192 runs, 0 skips adpcm_ms-stereo.err: 2751 decicycles in gcd, 4096 runs, 0 skips alac-24-level-0.err: 684 decicycles in gcd, 4096 runs, 0 skips alac-24-level-1.err: 684 decicycles in gcd, 4096 runs, 0 skips alac-24-level-2.err: 696 decicycles in gcd, 4096 runs, 0 skips alac-24-lpc-orders.err: 680 decicycles in gcd, 4096 runs, 0 skips amrwb-8k85.err: 1052 decicycles in gcd, 1023 runs, 1 skips dca-core.err: 1800 decicycles in gcd, 511 runs, 1 skips dca-core.err: 1850 decicycles in gcd, 1023 runs, 1 skips dca-xll.err: 1439 decicycles in gcd, 4096 runs, 0 skips filter-metadata-scenedetect.err: 794 decicycles in gcd, 4096 runs, 0 skips filter-pixdesc-gbrp14le.err: 660 decicycles in gcd, 127 runs, 1 skips filter-pixfmts-hflip.err: 761 decicycles in gcd, 63 runs, 1 skips filter-pixfmts-hflip.err: 731 decicycles in gcd, 127 runs, 1 skips flac-24-comp-8.err: 665 decicycles in gcd, 4096 runs, 0 skips h264-conformance-canl3_sony_c.err: 824 decicycles in gcd, 1023 runs, 1 skips h264-conformance-canl3_sony_c.err: 772 decicycles in gcd, 2047 runs, 1 skips h264-conformance-frext-hpcafl_bcrm_c.err: 767 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcaflnl_bcrm_c.err: 724 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcamapalq_bcrm_b.err: 812 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcvfl_bcrm_a.err: 761 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcvflnl_bcrm_a.err: 764 decicycles in gcd, 4096 runs, 0 skips h264-conformance-ls_sva_d.err: 815 decicycles in gcd, 4096 runs, 0 skips h264-conformance-ls_sva_d.err: 820 decicycles in gcd, 8192 runs, 0 skips h264-conformance-ls_sva_d.err: 808 decicycles in gcd, 16383 runs, 1 skips lagarith-yuy2.err: 789 decicycles in gcd, 15 runs, 1 skips lagarith-yuy2.err: 726 decicycles in gcd, 31 runs, 1 skips lossless-meridianaudio.err: 1268 decicycles in gcd, 4096 runs, 0 skips lossless-meridianaudio.err: 1279 decicycles in gcd, 8189 runs, 3 skips lossless-meridianaudio.err: 1305 decicycles in gcd, 16381 runs, 3 skips lossless-truehd-5.1-downmix-2.0.err: 1036 decicycles in gcd, 4096 runs, 0 skips lossless-truehd-5.1.err: 1016 decicycles in gcd, 4096 runs, 0 skips opus-testvector01.err: 1246 decicycles in gcd, 4096 runs, 0 skips opus-testvector05.err: 1025 decicycles in gcd, 4096 runs, 0 skips opus-testvector07.err: 1002 decicycles in gcd, 4096 runs, 0 skips opus-testvector07.err: 1008 decicycles in gcd, 8191 runs, 1 skips opus-testvector10.err: 1241 decicycles in gcd, 2047 runs, 1 skips ra-288.err: 656 decicycles in gcd, 4096 runs, 0 skips sipr-16k.err: 681 decicycles in gcd, 4096 runs, 0 skips swr-resample-fltp-44100-2626.err: 704 decicycles in gcd, 127 runs, 1 skips twinvq.err: 2410 decicycles in gcd, 4096 runs, 0 skips vqf-demux.err: 2441 decicycles in gcd, 511 runs, 1 skips vqf-demux.err: 2506 decicycles in gcd, 1023 runs, 1 skips vqf-demux.err: 2478 decicycles in gcd, 2047 runs, 1 skips vqf-demux.err: 2405 decicycles in gcd, 4095 runs, 1 skips vsynth2-r210.err: 697 decicycles in gcd, 127 runs, 1 skips vsynth2-r210.err: 698 decicycles in gcd, 255 runs, 1 skips vsynth_lena-ffv1-v0.err: 723 decicycles in gcd, 127 runs, 1 skips vsynth_lena-mpeg2-ilace.err: 1655 decicycles in gcd, 2047 runs, 1 skips wavpack-cuesheet.err: 1199 decicycles in gcd, 4096 runs, 0 skips wavpack-cuesheet.err: 1173 decicycles in gcd, 8192 runs, 0 skips new: aac-al18_44.err: 742 decicycles in gcd, 4096 runs, 0 skips aac-am00_88.err: 743 decicycles in gcd, 4096 runs, 0 skips aac-am00_88.err: 742 decicycles in gcd, 8192 runs, 0 skips aac-er_eld1001np_44_ep0.err: 757 decicycles in gcd, 4096 runs, 0 skips aac-er_eld2000np_48_ep0.err: 768 decicycles in gcd, 4096 runs, 0 skips aac-er_eld2100np_48_ep0.err: 598 decicycles in gcd, 4096 runs, 0 skips aac-fixed-al18_44.err: 745 decicycles in gcd, 4096 runs, 0 skips aac-fixed-er_eld1001np_44_ep0.err: 881 decicycles in gcd, 4096 runs, 0 skips aac-fixed-er_eld2000np_48_ep0.err: 813 decicycles in gcd, 4096 runs, 0 skips ac3-fixed-2.0.err: 462 decicycles in gcd, 31 runs, 1 skips ac3-fixed-2.0.err: 548 decicycles in gcd, 63 runs, 1 skips ac3-fixed-2.0.err: 586 decicycles in gcd, 127 runs, 1 skips ac3-fixed-2.0.err: 590 decicycles in gcd, 255 runs, 1 skips acodec-adpcm-adx.err: 736 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-adx.err: 733 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-adx.err: 739 decicycles in gcd, 16384 runs, 0 skips acodec-adpcm-adx-trellis.err: 878 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-adx-trellis.err: 877 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-adx-trellis.err: 877 decicycles in gcd, 16384 runs, 0 skips acodec-adpcm-ima_qt.err: 724 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-ima_qt.err: 724 decicycles in gcd, 8192 runs, 0 skips acodec-adpcm-ima_qt-trellis.err: 702 decicycles in gcd, 4096 runs, 0 skips acodec-adpcm-ima_qt-trellis.err: 724 decicycles in gcd, 8192 runs, 0 skips adpcm_ms-stereo.err: 736 decicycles in gcd, 4096 runs, 0 skips alac-24-level-0.err: 406 decicycles in gcd, 4096 runs, 0 skips alac-24-level-1.err: 415 decicycles in gcd, 4096 runs, 0 skips alac-24-level-2.err: 418 decicycles in gcd, 4096 runs, 0 skips alac-24-lpc-orders.err: 434 decicycles in gcd, 4096 runs, 0 skips dca-xll.err: 953 decicycles in gcd, 4096 runs, 0 skips delphine-cin-video.err: 786 decicycles in gcd, 511 runs, 1 skips filter-metadata-scenedetect.err: 655 decicycles in gcd, 4096 runs, 0 skips filter-pixfmts-il.err: 584 decicycles in gcd, 127 runs, 1 skips filter-pixfmts-null.err: 866 decicycles in gcd, 3 runs, 1 skips filter-pixfmts-null.err: 548 decicycles in gcd, 7 runs, 1 skips filter-pixfmts-null.err: 671 decicycles in gcd, 15 runs, 1 skips filter-pixfmts-null.err: 548 decicycles in gcd, 31 runs, 1 skips filter-pixfmts-null.err: 504 decicycles in gcd, 63 runs, 1 skips filter-pixfmts-null.err: 497 decicycles in gcd, 127 runs, 1 skips filter-pixfmts-scale.err: 537 decicycles in gcd, 63 runs, 1 skips filter-pixfmts-scale.err: 555 decicycles in gcd, 127 runs, 1 skips flac-16-chmode-left_side.err: 899 decicycles in gcd, 510 runs, 2 skips flac-24-comp-8.err: 366 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcafl_bcrm_c.err: 583 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcaflnl_bcrm_c.err: 691 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcamapalq_bcrm_b.err: 735 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcvfl_bcrm_a.err: 699 decicycles in gcd, 4096 runs, 0 skips h264-conformance-frext-hpcvflnl_bcrm_a.err: 679 decicycles in gcd, 4096 runs, 0 skips h264-conformance-ls_sva_d.err: 590 decicycles in gcd, 4096 runs, 0 skips h264-conformance-ls_sva_d.err: 589 decicycles in gcd, 8192 runs, 0 skips h264-conformance-ls_sva_d.err: 585 decicycles in gcd, 16383 runs, 1 skips interplay-mve-16bit.err: 695 decicycles in gcd, 511 runs, 1 skips lavf-ffm.err: 918 decicycles in gcd, 2047 runs, 1 skips lossless-meridianaudio.err: 668 decicycles in gcd, 4096 runs, 0 skips lossless-meridianaudio.err: 635 decicycles in gcd, 8192 runs, 0 skips lossless-meridianaudio.err: 670 decicycles in gcd, 16384 runs, 0 skips lossless-truehd-5.1-downmix-2.0.err: 511 decicycles in gcd, 4096 runs, 0 skips lossless-truehd-5.1.err: 632 decicycles in gcd, 4096 runs, 0 skips opus-testvector01.err: 849 decicycles in gcd, 4096 runs, 0 skips opus-testvector05.err: 579 decicycles in gcd, 4096 runs, 0 skips opus-testvector07.err: 669 decicycles in gcd, 4096 runs, 0 skips opus-testvector07.err: 661 decicycles in gcd, 8192 runs, 0 skips ra-288.err: 422 decicycles in gcd, 4096 runs, 0 skips sipr-16k.err: 436 decicycles in gcd, 4096 runs, 0 skips twinvq.err: 775 decicycles in gcd, 4096 runs, 0 skips vqf-demux.err: 708 decicycles in gcd, 4096 runs, 0 skips -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel