On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos <ceho...@ag.or.at> wrote: >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: >> >>> It is well known that fabs and fabsf are at least as fast and usually >>> faster than the FFABS macro, at least on the gcc+glibc combination. >> >> I wasn't aware of this. >> And I believe we support other compilers and other >> libc implementations. > > Indeed, which is why performance comparisons are welcome. I argue > below why any sane configuration should not regress performance wise. > This is also "relevant information" in my view. > >> >>> For instance, see the reference: >>> http://patchwork.sourceware.org/patch/6735/. >>> This was a patch to glibc in order to remove their usages. Given their >>> general performance obsession (more than FFmpeg in many cases), they >>> have ensured that fabs and fabsf never peform worse than FFABS. >> >> Ok but is this really related? > > The reference is, the comment may not be, I was slightly annoyed at > FFABS usage when libc provides them on all our platforms, and wanted a > justification that would appeal to the FFmpeg crowd, namely peformance > to move away from them. > >> >>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE >>> mode enabled, and just the standard -O3 optimizations, there is a >>> performance benefit. >> >> This is the only relevant information imo. >> Please provide (very, very short) information >> on what you tested. > > Random integers, same style as before. I have not posted numbers, > since my numbers are anyway meaningless: I lack non > x86-64+(gcc/clang)+glibc configurations. > As for that being the only relevant message, I do intend to shorten > the message. The long stuff was simply my own personal motivation to > make people understand why I did this stuff. Otherwise, I would have > sent a separate message anyway in the patch thread, let me know what > style you prefer. > >> >> Since you mention libc so often: Does the patch >> work on win*, aix and other strange platforms? > > Why not, any standard, conformant fabs/fabsf should. Again, I lack the > configurations and am just a university student with a single laptop. > fabs and fabsf are already being used elsewhere. Inf anything, they > are far better specified on IEEE 754 than FFABS - behavior with NaN, > Inf, etc.
Bench from libavfilter/astats on a 15 min clip. Of course the difference is slight, but nonetheless it exists. The best case is the same, but look at the difference in the worst cases (as was mentioned in the glibc link I gave, I suspect some trickery for subnormal floats/Inf/0.0). By the way, I can show results skewing even more heavily in favor of fabs by using "random" floating point numbers, random in the sense of being a random 64 bit pattern (same style as my old crude bench - fill a large array, and test). There, believe it or not, I was getting a nearly 1.5-2x improvement. Anyway, here it is: old: 4230 decicycles in abs, 1 runs, 0 skips 2520 decicycles in abs, 2 runs, 0 skips 1635 decicycles in abs, 4 runs, 0 skips 967 decicycles in abs, 8 runs, 0 skips 635 decicycles in abs, 16 runs, 0 skips 473 decicycles in abs, 32 runs, 0 skips 389 decicycles in abs, 64 runs, 0 skips 350 decicycles in abs, 128 runs, 0 skips 331 decicycles in abs, 256 runs, 0 skips 321 decicycles in abs, 512 runs, 0 skips 319 decicycles in abs, 1024 runs, 0 skips 318 decicycles in abs, 2048 runs, 0 skips 315 decicycles in abs, 4096 runs, 0 skips 317 decicycles in abs, 8192 runs, 0 skips 335 decicycles in abs, 16384 runs, 0 skips 335 decicycles in abs, 32768 runs, 0 skips 333 decicycles in abs, 65536 runs, 0 skips 342 decicycles in abs, 131072 runs, 0 skips 340 decicycles in abs, 262144 runs, 0 skips 345 decicycles in abs, 524285 runs, 3 skips 348 decicycles in abs, 1048565 runs, 11 skips 351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A 352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A 350 decicycles in abs, 8388498 runs, 110 skipsbitrate=N/A 351 decicycles in abs,16776993 runs, 223 skipsbitrate=N/A 352 decicycles in abs,33553999 runs, 433 skipsbitrate=N/A 351 decicycles in abs,67108036 runs, 828 skips new: 3540 decicycles in abs, 1 runs, 0 skips 2160 decicycles in abs, 2 runs, 0 skips 1447 decicycles in abs, 4 runs, 0 skips 881 decicycles in abs, 8 runs, 0 skips 594 decicycles in abs, 16 runs, 0 skips 455 decicycles in abs, 32 runs, 0 skips 382 decicycles in abs, 64 runs, 0 skips 361 decicycles in abs, 128 runs, 0 skips 356 decicycles in abs, 256 runs, 0 skips 334 decicycles in abs, 512 runs, 0 skips 322 decicycles in abs, 1024 runs, 0 skips 317 decicycles in abs, 2048 runs, 0 skips 315 decicycles in abs, 4096 runs, 0 skips 341 decicycles in abs, 8192 runs, 0 skips 363 decicycles in abs, 16383 runs, 1 skips 342 decicycles in abs, 32767 runs, 1 skips 354 decicycles in abs, 65532 runs, 4 skips 348 decicycles in abs, 131068 runs, 4 skips 354 decicycles in abs, 262138 runs, 6 skips 356 decicycles in abs, 524277 runs, 11 skips 356 decicycles in abs, 1048560 runs, 16 skips 354 decicycles in abs, 2097120 runs, 32 skipsbitrate=N/A 354 decicycles in abs, 4194251 runs, 53 skipsbitrate=N/A 353 decicycles in abs, 8388504 runs, 104 skipsbitrate=N/A 353 decicycles in abs,16777006 runs, 210 skipsbitrate=N/A 353 decicycles in abs,33553993 runs, 439 skipsbitrate=N/A 352 decicycles in abs,67107951 runs, 913 skips > >> >> Carl Eugen >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel