On Fri, Oct 16, 2015 at 7:53 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer > <mich...@niedermayer.cc> wrote: >> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: >>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.lepp...@gmail.com> >>> wrote: >>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <ceho...@ag.or.at> >>> > wrote: >>> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: >>> >> >>> >>> What? My numbers actually show that the new code may be faster - >>> >> >>> >> No, you are misunderstanding the numbers you posted. >>> >> (Or I misunderstand them but nobody said so yet.) >>> >> >>> >> Highest runs are most relevant, skips have to be >>> >> avoided (afaik). >>> >> >>> >> [...] >>> >> >>> >>> If you continue to post such stuff that has no basis, I might actually >>> >>> get tempted into finding out for which floating point values the new >>> >>> code is significantly faster, craft a relevant audio file, and post it >>> >>> showing a huge performance difference - my random numbers benchmark >>> >>> shows there must exist such values. >>> >> >>> >> Please do so! >>> >> >>> >>> > The more important question is if you can see the same >>> >>> > changes in the disassembly of af_astats.o as what >>> >>> > ubitux posted here for a short test function? >>> >>> >>> >>> I do. He uses clang/gcc, so do I. >>> >> >>> >> Sorry, my understanding fails here (I am not a native speaker): >>> >> You did look at the disassembly of af_astats.o and there is >>> >> inlined code instead of a function call? >>> >> >>> >>> The reason (irrelevant) is that both >>> >>> of us run Arch. >>> >>> >>> >>> What is "more relevant" is if _you_ can see the changes >>> >>> on some non Linux platform. >>> >> >>> >> If you could show that it is faster on any platform >>> >> I would already be happy! >>> >> >>> > >>> > A more important check would be that its not significantly slower on >>> > any other platform. Just because one compiler/glibc combination >>> > manages to produce an efficient inlined function doesn't necessarily >>> > mean that some other compiler or libc couldn't produce a full function >>> > call with all the overhead that comes with it, becoming significantly >>> > slower. >>> >>> As I point out, all a libc implementer needs to do to be on par with >>> the macro is to add the inline keyword. This was added in c99. If said >>> libc does not, then it is fundamentally broken from a performance >>> perspective. A beginning programmer can do that in a couple of >>> minutes. Fix upstream and complain to them if it does not inline. >> >> I dont know how the latest compilers handle "inline" but a few years >> ago gcc was rather dumb about inlining, and i think its not easy for >> a compiler to be actually not "dumb" >> >> A compiler cannot inline everything that has the inline keyword, >> it would lead (for some source code) to an explosion on size and >> compile time. >> and a good compiler will want to inline some functions even if they >> do not have the inline keyword >> Also its not easy to know for a compiler what to >> inline and what not, there could be 10 functions a1(),a2(), a3(), ... >> each calling the previous 10 times ... >> the way gcc handled this (in the past and AFAIK at least) is to have >> various complicated thresholds that limit the amount of inlining. >> The big annoyance with this (years ago at least) was that if you >> forced a function to be inlined by "force" gcc would then stop >> inlining something else and you ended up either forcing every single >> function you needed inlined or would have had to tune the thresholds >> >> it would be interresting to check if replacing FFABS by fabs causes >> any big changes to inlining behavior (maybe that can be done by >> comparing the list of symbols in the object files as fully inlined >> functions s´wouldnt show up but maybe there are other ways) >> >> anyway iam not against using fabs() for float/double FFABS() >> i just think some assumtations in this thread are possibly too >> optimistic, but its quite possible these replacements are all fine >> and the changes in inlining if any have no performance impact > > I myself am not "optimistic" in the sense that I think most of the > time this will have zero change. All I am saying is that in cases > where there is a difference, it will likely be in favor of fabs, etc > and not the macro due to reasons I mentioned in the long commit > message I posted. > >> >> also if a *abs is implemented by using a branch (as in if its positive >> jump over a negate instruction) then branch prediction can play a >> sigificant role in performance, that is random values would be alot >> slower than the same values ordered > > Maybe this is why I get such a large difference between fabs and FFABS > in favor of fabs - I just keep random numbers with no ordering. If > true, this is definitely in fabs's favor. > >> a good implementation should not use a branch though, abs for floats >> and doubles is just setting the sign bit basically, platforms should >> have a dedicated instruction for that or in some cases a integer >> and/or could maybe even be used > > That was the point of the original libc link - I am somewhat annoyed > that some dismissed it as "irrelevant" in a cavalier manner. > Basically, what the glibc people observed was that the compiler was > not always optimizing FFABS correctly (as compared to fabs etc). Maybe > this leads to a performance difference.
To put an end to a long and tortuous thread, and due to the lack of relevant outstanding objections, pushed. > >> >> [...] >> -- >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB >> >> Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel