On Tue, Dec 29, 2015 at 09:28:34AM -0800, Ganesh Ajjanagadde wrote: > The table is highly structured, so pow (or exp2 for that matter) can entirely > be avoided, yielding a ~ 40x speedup with no loss of accuracy. > > sample benchmark (Haswell, GNU/Linux): > new: > 4449 decicycles in init_pow2table(loop 1000), 254 runs, 2 skips > 4411 decicycles in init_pow2table(loop 1000), 510 runs, 2 skips > 4391 decicycles in init_pow2table(loop 1000), 1022 runs, 2 skips > > old: > 183673 decicycles in init_pow2table(loop 1000), 256 runs, 0 skips > 182142 decicycles in init_pow2table(loop 1000), 512 runs, 0 skips > 182104 decicycles in init_pow2table(loop 1000), 1024 runs, 0 skips > > Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > --- > libavcodec/cook.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/cook.c b/libavcodec/cook.c > index d8fb736..aa434a2 100644 > --- a/libavcodec/cook.c > +++ b/libavcodec/cook.c > @@ -166,10 +166,17 @@ static float rootpow2tab[127]; > /* table generator */ > static av_cold void init_pow2table(void) > { > + /* fast way of computing 2^i and 2^(0.5*i) for -63 <= i < 64 */ > int i; > + static const float exp2_tab[2] = {1, M_SQRT2};
> + float exp2_val = 1.0842021724855044e-19; /* 2^(-63) */ > + float root_val = 2.3283064365386963e-10; /* 2^(-32) */ I'm pretty sure you can do float exp2_val = pow(2, -63); float root_val = pow(2, -32); and compilers will inline them [...] -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel