On Sat, Mar 12, 2016 at 9:15 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Sat, Mar 12, 2016 at 9:03 AM, Ganesh Ajjanagadde <gajja...@gmail.com> > wrote: > >> On Fri, Mar 11, 2016 at 9:30 AM, Ronald S. Bultje <rsbul...@gmail.com> >> wrote: >> > Hi, >> > >> > On Thu, Mar 10, 2016 at 9:23 PM, Ganesh Ajjanagadde <gajja...@gmail.com> >> > wrote: >> > >> >> On Thu, Mar 10, 2016 at 8:56 AM, Ronald S. Bultje <rsbul...@gmail.com> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Thu, Mar 10, 2016 at 2:37 AM, Reimar Döffinger < >> >> reimar.doeffin...@gmx.de> >> >> > wrote: >> >> > >> >> >> On 10.03.2016, at 03:06, Ganesh Ajjanagadde <gajja...@gmail.com> >> wrote: >> >> >> >> >> >> > On Wed, Mar 9, 2016 at 2:16 AM, Reimar Döffinger >> >> >> > <reimar.doeffin...@gmx.de> wrote: >> >> >> >> On 08.03.2016, at 04:48, Ganesh Ajjanagadde <gajja...@gmail.com> >> >> wrote: >> >> >> >> >> >> >> >>> + nzl += expf(logf(s / ethresh) * nzslope); >> >> >> >> >> >> >> >> Shouldn't log2f/exp2f be faster? >> >> >> >> log2f at least has CPU support on x86 AFAICT. >> >> >> > >> >> >> > I had tested this, and no, though it is still faster than powf. >> >> >> > >> >> >> > It still seems to rely on libm, note that we don't use -ffast-math >> and >> >> >> > a look at >> >> >> https://github.com/lattera/glibc/tree/master/sysdeps/x86_64/fpu >> >> >> > as well seems to say no. Problem is, GNU people like to prioritize >> >> >> > "correctly rounded" behavior over fast, reasonably accurate code, >> >> >> > sometimes to ludicruous degrees. >> >> >> > >> >> >> > Personally, I don't know why we don't use -ffast-math, not many >> seem >> >> >> > to care that heavily on strict IEEE semantics. Maybe it leads to >> too >> >> >> > much variation across platforms? >> >> >> >> >> >> You lose some guarantees. In particular, the compiler will assume >> NaNs >> >> do >> >> >> not happen and you cannot predict which code path (after a comparison >> >> for >> >> >> example) they take. >> >> >> But some code for either security or correctness reasons needs them >> to >> >> be >> >> >> handled a certain way. >> >> >> I guess in theory you could try to make sure fisnan is used in all >> those >> >> >> cases, but then you need to find them, and I think if you take >> >> -ffast-math >> >> >> description literally there is no guarantee that even fisnan >> continues >> >> to >> >> >> work... I am also not sure none of the code relies on order of >> >> operations >> >> >> to get the precision it needs. >> >> >> So it is simply too dangerous. >> >> >> Some more specific options might be possible to use though (but I >> think >> >> >> even full -ffast-math gains you almost nothing? Does it even help >> >> here?). >> >> >> >> Yes, sorry, I meant some specific things from -ffast-math. I checked >> >> configure, most of the unambiguously clear ones are already being >> >> turned on. As such, it seems ok. >> >> >> >> > >> >> > >> >> > One could also consider writing some customized assembly (calling the >> >> > relevant instructions instead of C wrappers) in cases where it is >> >> > speed-sensitive. It's sort of the inverse of what Ganesh is >> suggesting, I >> >> > guess, maybe some more effort involved but it can't be that much. You >> >> could >> >> > even use av_always_inline functions and inline assembly to call the >> >> > relevant instruction and otherwise keep things in C. That's identical >> to >> >> > what -ffast-math does but turns on only when specifically calling the >> new >> >> > API function name... >> >> >> >> So seems like everything wrt this patch is fine, right? >> > >> > >> > Not really. Your patch still does two things, and I don't like the >> explicit >> > exp(log(a)*b). >> >> Well, both are needed for the speedup. Without the 2.0 check, there is >> a speed regression. I don't understand why it is "two things" in that >> case. >> >> > What I'm thinking is that you should have a static inline >> > function, let's call it fast_pow(a, b), which can internally (in the C >> > version) be implemented as exp+log. Just as you found for pow, we might >> > find that for exp/log, the system lib is not very optimized and we can do >> > it faster ourselves by doing whatever -ffast-math is doing for these >> > functions. Those would be more specifically optimized and that would be >> > part of the fast_pow implementation. This way, the code in aacenc remains >> > easy to follow and the optimization is accessible for other parts of >> ffmpeg >> > also. >> >> Ok, changed locally. > > > Please submit a new patch, this is not a minor cosmetic change.
I definitely will, just was addressing your first point and making it clear why I did it that way, so that I don't waste effort. > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel