On Tue, Dec 15, 2015 at 2:23 AM, Michael Niedermayer <michae...@gmx.at> wrote: > On Wed, Dec 09, 2015 at 06:55:25PM -0500, Ganesh Ajjanagadde wrote: >> exp2 suffices here. Some trivial (~ 4x) speedup is done in addition here by >> reusing results. >> > >> This is bit-identical to the old values. > > That may be true with a specific compiler and version but > C makes no such guarantee unless iam missing something
Indeed, fpu behavior is largely uncontrolled. What I meant was that under reasonable settings, there is no accuracy loss (tested with clang and gcc). Basically, a priori it is not obvious that this change is ok, but I tested and the values don't change. For that matter, this is a rather minor point - many pow implementations are anyway broken wrt precise < 1 ulp or better yet < 0.5 ulp rounding. > > >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> --- >> libavcodec/nellymoserenc.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/libavcodec/nellymoserenc.c b/libavcodec/nellymoserenc.c >> index d998dba..e6023e3 100644 >> --- a/libavcodec/nellymoserenc.c >> +++ b/libavcodec/nellymoserenc.c >> @@ -179,8 +179,15 @@ static av_cold int encode_init(AVCodecContext *avctx) >> >> /* Generate overlap window */ >> ff_init_ff_sine_windows(7); >> - for (i = 0; i < POW_TABLE_SIZE; i++) >> - pow_table[i] = pow(2, -i / 2048.0 - 3.0 + POW_TABLE_OFFSET); >> + pow_table[0] = 1; >> + pow_table[1024] = M_SQRT1_2; >> + for (i = 1; i < 513; i++) { >> + double tmp = exp2(-i / 2048.0); >> + pow_table[i] = tmp; >> + pow_table[1024-i] = M_SQRT1_2 / tmp; >> + pow_table[1024+i] = tmp * M_SQRT1_2; >> + pow_table[2048-i] = 0.5 / tmp; > > how much overall init time is gained by this ? > that is time in ffmpeg main() from start to finish when just opening > the file with no decoding aka ./ffmpeg -i somefile Don't know, all I know is cycles are unnecessarily wasted. Will put in cycle numbers. > ? > > iam asking as this makes the intend of the loop harder to > see/understand > if this change is done then please add a comment explaining what the > code does or maybe leave the unoptimized code in a comment Comment will be added, sorry. Note that the trick may be employed to a finer resolution, ie by storing exp2(3/4) etc and utilizing in simple multiplications and divisions. There will come a point where the overhead of the binary size increase will dominate. I chose this as this is reasonably readable and improves speed at minimal binary size impact. > > either way replacing pow -> exp2 LGTM > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel