Hi, On Tue, Oct 13, 2015 at 8:09 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch <u...@pkh.me> wrote: > > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: > >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajja...@mit.edu> > wrote: > >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <ceho...@ag.or.at> > wrote: > >> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: > >> >> > >> >>> Bench from libavfilter/astats on a 15 min clip. > >> >> > >> >> I believe that your test would indicate that the > >> >> old variant is faster or that no result can be > >> >> given which is what my tests show. > >> > >> Also, how you can possibly believe that the old variant is faster is > >> beyond me given the astonishing amount of work by Intel, Red Hat, and > >> others to create the absolutely best performing libc. > >> > >> Just have a look at > >> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281 > , > >> it gives an idea of the extreme lengths they go to. > >> > > > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD > > > > [/tmp]☭ cat a.c > > #include <math.h> > > #include <stdlib.h> > > > > #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) > > > > double f1d(double x) { return fabs(x); } > > double f2d(double x) { return FFABS(x); } > > > > int f1i(int x) { return abs(x); } > > int f2i(int x) { return FFABS(x); } > > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o > > > > a.o: file format elf64-x86-64 > > > > > > Disassembly of section .text: > > > > 0000000000000000 <f1d>: > > 0: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # > 8 <f1d+0x8> > > 7: 00 > > 8: 66 0f 54 c1 andpd xmm0,xmm1 > > c: c3 ret > > d: 0f 1f 00 nop DWORD PTR [rax] > > > > 0000000000000010 <f2d>: > > 10: 66 0f 2e 05 00 00 00 ucomisd xmm0,QWORD PTR [rip+0x0] > # 18 <f2d+0x8> > > 17: 00 > > 18: 72 06 jb 20 <f2d+0x10> > > 1a: f3 c3 repz ret > > 1c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] > > 20: f2 0f 10 0d 00 00 00 movsd xmm1,QWORD PTR [rip+0x0] # > 28 <f2d+0x18> > > 27: 00 > > 28: 66 0f 57 c1 xorpd xmm0,xmm1 > > 2c: c3 ret > > 2d: 0f 1f 00 nop DWORD PTR [rax] > > > > 0000000000000030 <f1i>: > > 30: 89 fa mov edx,edi > > 32: 89 f8 mov eax,edi > > 34: c1 fa 1f sar edx,0x1f > > 37: 31 d0 xor eax,edx > > 39: 29 d0 sub eax,edx > > 3b: c3 ret > > 3c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] > > > > 0000000000000040 <f2i>: > > 40: 89 fa mov edx,edi > > 42: 89 f8 mov eax,edi > > 44: c1 fa 1f sar edx,0x1f > > 47: 31 d0 xor eax,edx > > 49: 29 d0 sub eax,edx > > 4b: c3 ret > > [/tmp]☭ > > > > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is > > essentially identical to FFABS(). > > > > I have similar results with clang (3.7.0). > > > > Conclusion: using fabs() looks better with at least recent versions of > clang > > and GCC on x86-64 (but may introduce slight behaviour changes?) > > > > To be more rigorous, it would be interesting to compare on different > arch & > > compilers, but changing FFABS() with fabs() sounds OK to me. > > I noticed that is being applied piecemeal, and some of it has been > pushed. Does that mean I am free to push (with the reduced commit > message) as well? You'll notice that Paul did it for the filters he maintains. I'm fine with you doing this for any code I maintain (no further review required). You can find maintainers for each piece of code in git log or MAINTAINERS. It sounds like Paul is fine with this also. I think the general case, it'd be nice to figure out why Carl's results are slightly different from yours (or maybe it's noise?). If we can resolve that, I don't think there's any further outstanding objections, right? Also, is a single push preferred, or one for each > file (like the way it is being done)? Whatever you prefer, really. Small commits have the advantage that they're easier to bisect/revert if something breaks, but it's obviously a lot more work to create N small patches than 1 big patch. So both approaches have advantages and disadvantages and we just do whatever works best for each of us. (Not very consistent, I admit.) Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel