On Tue, Dec 29, 2015 at 11:29 AM, Clément Bœsch <u...@pkh.me> wrote: > On Tue, Dec 29, 2015 at 09:28:34AM -0800, Ganesh Ajjanagadde wrote: >> The table is highly structured, so pow (or exp2 for that matter) can entirely >> be avoided, yielding a ~ 40x speedup with no loss of accuracy. >> >> sample benchmark (Haswell, GNU/Linux): >> new: >> 4449 decicycles in init_pow2table(loop 1000), 254 runs, 2 skips >> 4411 decicycles in init_pow2table(loop 1000), 510 runs, 2 skips >> 4391 decicycles in init_pow2table(loop 1000), 1022 runs, 2 skips >> >> old: >> 183673 decicycles in init_pow2table(loop 1000), 256 runs, 0 skips >> 182142 decicycles in init_pow2table(loop 1000), 512 runs, 0 skips >> 182104 decicycles in init_pow2table(loop 1000), 1024 runs, 0 skips >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> --- >> libavcodec/cook.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/libavcodec/cook.c b/libavcodec/cook.c >> index d8fb736..aa434a2 100644 >> --- a/libavcodec/cook.c >> +++ b/libavcodec/cook.c >> @@ -166,10 +166,17 @@ static float rootpow2tab[127]; >> /* table generator */ >> static av_cold void init_pow2table(void) >> { >> + /* fast way of computing 2^i and 2^(0.5*i) for -63 <= i < 64 */ >> int i; >> + static const float exp2_tab[2] = {1, M_SQRT2}; > >> + float exp2_val = 1.0842021724855044e-19; /* 2^(-63) */ >> + float root_val = 2.3283064365386963e-10; /* 2^(-32) */ > > I'm pretty sure you can do > float exp2_val = pow(2, -63); > float root_val = pow(2, -32); > and compilers will inline them
Any decent compiler would. Alternatively, if we had hexadecimal floating point literals (%a) on all platforms, a C99 feature, it would look quite clean. Hexadecimal floating point literals are also nice as they are bit-exact representations of the underlying float, unlike decimal constants where one needs to reason about how many digits one needs. I believe "%.17g works for IEEE-754 doubles, note that for instance %.17lf does not on very small inputs. Unfortunately, MSVC lacks hexadecimal floating literals. I really don't mind either way, and since you prefer pow(2,-63), I have changed locally. > > [...] > > -- > Clément B. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel