On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <ceho...@ag.or.at> wrote: > >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: > >> > >>> Bench from libavfilter/astats on a 15 min clip. > >> > >> I believe that your test would indicate that the > >> old variant is faster or that no result can be > >> given which is what my tests show. > > Also, how you can possibly believe that the old variant is faster is > beyond me given the astonishing amount of work by Intel, Red Hat, and > others to create the absolutely best performing libc. > > Just have a look at > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281, > it gives an idea of the extreme lengths they go to. >
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD [/tmp]☭ cat a.c #include <math.h> #include <stdlib.h> #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) double f1d(double x) { return fabs(x); } double f2d(double x) { return FFABS(x); } int f1i(int x) { return abs(x); } int f2i(int x) { return FFABS(x); } [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o a.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <f1d>: 0: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # 8 <f1d+0x8> 7: 00 8: 66 0f 54 c1 andpd xmm0,xmm1 c: c3 ret d: 0f 1f 00 nop DWORD PTR [rax] 0000000000000010 <f2d>: 10: 66 0f 2e 05 00 00 00 ucomisd xmm0,QWORD PTR [rip+0x0] # 18 <f2d+0x8> 17: 00 18: 72 06 jb 20 <f2d+0x10> 1a: f3 c3 repz ret 1c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 20: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # 28 <f2d+0x18> 27: 00 28: 66 0f 57 c1 xorpd xmm0,xmm1 2c: c3 ret 2d: 0f 1f 00 nop DWORD PTR [rax] 0000000000000030 <f1i>: 30: 89 fa mov edx,edi 32: 89 f8 mov eax,edi 34: c1 fa 1f sar edx,0x1f 37: 31 d0 xor eax,edx 39: 29 d0 sub eax,edx 3b: c3 ret 3c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 0000000000000040 <f2i>: 40: 89 fa mov edx,edi 42: 89 f8 mov eax,edi 44: c1 fa 1f sar edx,0x1f 47: 31 d0 xor eax,edx 49: 29 d0 sub eax,edx 4b: c3 ret [/tmp]☭ So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is essentially identical to FFABS(). I have similar results with clang (3.7.0). Conclusion: using fabs() looks better with at least recent versions of clang and GCC on x86-64 (but may introduce slight behaviour changes?) To be more rigorous, it would be interesting to compare on different arch & compilers, but changing FFABS() with fabs() sounds OK to me. -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel