On Wed, Mar 9, 2016 at 1:32 PM, Reimar Döffinger <reimar.doeffin...@gmx.de> wrote: > On Wed, Mar 09, 2016 at 01:13:58PM +0100, Michael Niedermayer wrote: >> On Tue, Mar 08, 2016 at 10:16:50PM -0500, Ganesh Ajjanagadde wrote: >> > Yields 2x improvement in function performance, and boosts aac encoding >> > speed by ~ 4% overall. Sample benchmark (Haswell+GCC under -march=native): >> > after: >> > ffmpeg -i sin.flac -acodec aac -y sin_new.aac 5.22s user 0.03s system >> > 105% cpu 4.970 total >> > >> > before: >> > ffmpeg -i sin.flac -acodec aac -y sin_new.aac 5.40s user 0.05s system >> > 105% cpu 5.162 total >> > >> > Big shame that len-1 is -1 mod 4; 0 mod 4 would have yielded a further 2x >> > through >> > additional symmetry. Of course, one could approximate with the 0 mod 4 >> > variant, >> > error would essentially be ~ 1/len in the worst case. >> > >> > Signed-off-by: Ganesh Ajjanagadde <gajja...@gmail.com> >> > --- >> > libavcodec/lpc.c | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/libavcodec/lpc.c b/libavcodec/lpc.c >> > index 3839119..052aeaa 100644 >> > --- a/libavcodec/lpc.c >> > +++ b/libavcodec/lpc.c >> > @@ -176,9 +176,10 @@ double ff_lpc_calc_ref_coefs_f(LPCContext *s, const >> > float *samples, int len, >> > const double a = 0.5f, b = 1.0f - a; >> > >> > /* Apply windowing */ >> > - for (i = 0; i < len; i++) { >> > + for (i = 0; i <= len / 2; i++) { >> >> > double weight = a - b*cos((2*M_PI*i)/(len - 1)); >> >> the windows should not be recalcuated for each frame. But either >> calculated once during init or calculated at first use if that makes >> sense >> >> actually i suspect there should be close to 0 calls to libc math >> functions after init in the encoder, or is there some use of these >> functions in aac that cannot be replaced by a simple table? > > I don't understand the code, but I assumed that len changes > regularly here, which is the main problem. > I never actually checked if there is maybe only a small > number of common values at least though.
So I did check this; len does change quite regularly. On the other hand, it does not change too drastically over time, so a cache approach could work at a significant code complexity increase for minimal gain. In fact, I personally have no idea why it is absolutely critical that a Hamming window is used, see 0cfdaf45c4 where Welch was used previously. At a first glance I think one can find a window that works and avoids libm, e.g Parzen or other polynomial windows. Rostislav, any thoughts? > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel