On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: >> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: >> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <ceho...@ag.or.at> >> > wrote: >> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: >> >> >> >>> What? My numbers actually show that the new code may be faster - >> >> >> >> No, you are misunderstanding the numbers you posted. >> >> (Or I misunderstand them but nobody said so yet.) >> >> >> >> Highest runs are most relevant, skips have to be >> >> avoided (afaik). >> >> >> >> [...] >> >> >> >>> If you continue to post such stuff that has no basis, I might actually >> >>> get tempted into finding out for which floating point values the new >> >>> code is significantly faster, craft a relevant audio file, and post it >> >>> showing a huge performance difference - my random numbers benchmark >> >>> shows there must exist such values. >> >> >> >> Please do so! >> >> >> >>> > The more important question is if you can see the same >> >>> > changes in the disassembly of af_astats.o as what >> >>> > ubitux posted here for a short test function? >> >>> >> >>> I do. He uses clang/gcc, so do I. >> >> >> >> Sorry, my understanding fails here (I am not a native speaker): >> >> You did look at the disassembly of af_astats.o and there is >> >> inlined code instead of a function call? >> >> >> >>> The reason (irrelevant) is that both >> >>> of us run Arch. >> >>> >> >>> What is "more relevant" is if _you_ can see the changes >> >>> on some non Linux platform. >> >> >> >> If you could show that it is faster on any platform >> >> I would already be happy! >> >> >> > >> > A more important check would be that its not significantly slower on >> > any other platform. Just because one compiler/glibc combination >> > manages to produce an efficient inlined function doesn't necessarily >> > mean that some other compiler or libc couldn't produce a full function >> > call with all the overhead that comes with it, becoming significantly >> > slower. >> >> As I point out, all a libc implementer needs to do to be on par with >> the macro is to add the inline keyword. This was added in c99. If said >> libc does not, then it is fundamentally broken from a performance >> perspective. A beginning programmer can do that in a couple of >> minutes. Fix upstream and complain to them if it does not inline. > > I dont know how the latest compilers handle "inline" but a few years > ago gcc was rather dumb about inlining, and i think its not easy for > a compiler to be actually not "dumb" > > A compiler cannot inline everything that has the inline keyword, > it would lead (for some source code) to an explosion on size and > compile time. > and a good compiler will want to inline some functions even if they > do not have the inline keyword > Also its not easy to know for a compiler what to > inline and what not, there could be 10 functions a1(),a2(), a3(), ... > each calling the previous 10 times ... > the way gcc handled this (in the past and AFAIK at least) is to have > various complicated thresholds that limit the amount of inlining. > The big annoyance with this (years ago at least) was that if you > forced a function to be inlined by "force" gcc would then stop > inlining something else and you ended up either forcing every single > function you needed inlined or would have had to tune the thresholds > > it would be interresting to check if replacing FFABS by fabs causes > any big changes to inlining behavior (maybe that can be done by > comparing the list of symbols in the object files as fully inlined > functions s´wouldnt show up but maybe there are other ways) > > anyway iam not against using fabs() for float/double FFABS() > i just think some assumtations in this thread are possibly too > optimistic, but its quite possible these replacements are all fine > and the changes in inlining if any have no performance impact
I myself am not "optimistic" in the sense that I think most of the time this will have zero change. All I am saying is that in cases where there is a difference, it will likely be in favor of fabs, etc and not the macro due to reasons I mentioned in the long commit message I posted. > > also if a *abs is implemented by using a branch (as in if its positive > jump over a negate instruction) then branch prediction can play a > sigificant role in performance, that is random values would be alot > slower than the same values ordered Maybe this is why I get such a large difference between fabs and FFABS in favor of fabs - I just keep random numbers with no ordering. If true, this is definitely in fabs's favor. > a good implementation should not use a branch though, abs for floats > and doubles is just setting the sign bit basically, platforms should > have a dedicated instruction for that or in some cases a integer > and/or could maybe even be used That was the point of the original libc link - I am somewhat annoyed that some dismissed it as "irrelevant" in a cavalier manner. Basically, what the glibc people observed was that the compiler was not always optimizing FFABS correctly (as compared to fabs etc). Maybe this leads to a performance difference. > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel