On Thu, Jan 7, 2016 at 5:20 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Thu, Jan 7, 2016 at 4:48 PM, Michael Niedermayer > <mich...@niedermayer.cc> wrote: >> On Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde wrote: >>> This exploits an approach based on the sieve of Eratosthenes, a popular >>> method for generating prime numbers. >>> >>> Tables are identical to previous ones. >>> >>> Tested with FATE with/without --enable-hardcoded-tables. >>> >>> Sample benchmark (Haswell, GNU/Linux+gcc): >>> prev: >>> 7860100 decicycles in cbrt_tableinit, 1 runs, 0 skips >>> 7777490 decicycles in cbrt_tableinit, 2 runs, 0 skips >>> [...] >>> 7582339 decicycles in cbrt_tableinit, 256 runs, 0 skips >>> 7563556 decicycles in cbrt_tableinit, 512 runs, 0 skips >>> >>> new: >>> 2099480 decicycles in cbrt_tableinit, 1 runs, 0 skips >>> 2044470 decicycles in cbrt_tableinit, 2 runs, 0 skips >>> [...] >>> 1796544 decicycles in cbrt_tableinit, 256 runs, 0 skips >>> 1791631 decicycles in cbrt_tableinit, 512 runs, 0 skips >>> >>> Both small and large run count given as this is called once so small run >>> count may give a better picture, small numbers are fairly consistent, >>> and there is a consistent downward trend from small to large runs, >>> at which point it stabilizes to a new value. >>> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>> --- >>> libavcodec/aacdec_fixed.c | 4 +-- >>> libavcodec/aacdec_template.c | 2 +- >>> libavcodec/cbrt_tablegen.h | 53 >>> ++++++++++++++++++++++++++----------- >>> libavcodec/cbrt_tablegen_template.c | 12 ++++++++- >>> 4 files changed, 51 insertions(+), 20 deletions(-) >>> >>> diff --git a/libavcodec/aacdec_fixed.c b/libavcodec/aacdec_fixed.c >>> index 396a874..f7b882b 100644 >>> --- a/libavcodec/aacdec_fixed.c >>> +++ b/libavcodec/aacdec_fixed.c >>> @@ -155,9 +155,9 @@ static void vector_pow43(int *coefs, int len) >>> for (i=0; i<len; i++) { >>> coef = coefs[i]; >>> if (coef < 0) >>> - coef = -(int)cbrt_tab[-coef]; >>> + coef = -(int)cbrt_tab[-coef].i; >>> else >>> - coef = (int)cbrt_tab[coef]; >>> + coef = (int)cbrt_tab[coef].i; >>> coefs[i] = coef; >>> } >>> } >>> diff --git a/libavcodec/aacdec_template.c b/libavcodec/aacdec_template.c >>> index d819958..1380510 100644 >>> --- a/libavcodec/aacdec_template.c >>> +++ b/libavcodec/aacdec_template.c >>> @@ -1791,7 +1791,7 @@ static int decode_spectrum_and_dequant(AACContext >>> *ac, INTFLOAT coef[1024], >>> v = -v; >>> *icf++ = v; >>> #else >>> - *icf++ = cbrt_tab[n] | (bits & 1U<<31); >>> + *icf++ = cbrt_tab[n].i | (bits & >>> 1U<<31); >>> #endif /* USE_FIXED */ >>> bits <<= 1; >>> } else { >>> diff --git a/libavcodec/cbrt_tablegen.h b/libavcodec/cbrt_tablegen.h >>> index 59b5a1d..e3d6634 100644 >>> --- a/libavcodec/cbrt_tablegen.h >>> +++ b/libavcodec/cbrt_tablegen.h >>> @@ -26,14 +26,13 @@ >>> #include <stdint.h> >>> #include <math.h> >>> #include "libavutil/attributes.h" >>> +#include "libavutil/intfloat.h" >>> #include "libavcodec/aac_defines.h" >>> >>> -#if USE_FIXED >>> -#define CBRT(x) lrint((x).f * 8192) >>> -#else >>> -#define CBRT(x) x.i >>> -#endif >>> - >> >>> +union ff_int32float64 { >>> + uint32_t i; >>> + double f; >>> +}; >>> #if CONFIG_HARDCODED_TABLES >>> #if USE_FIXED >>> #define cbrt_tableinit_fixed() >>> @@ -43,20 +42,42 @@ >>> #include "libavcodec/cbrt_tables.h" >>> #endif >>> #else >>> -static uint32_t cbrt_tab[1 << 13]; >>> +static union ff_int32float64 cbrt_tab[1 << 13]; >> >> this doubles the size of the cpu cache needed at runtime to store >> the same number of elements > > Yes, it does, and it was a tradeoff I made that I forgot to list. One > can of course use floats; but this loses accuracy at significant > levels. > > So one could malloc and free a double precision array (for temporary > storage) at costs of some code complexity, possible heap > fragmentation, and the problem of possible failure (may be ok since > anyway aac_decode_init is not guaranteed to succeed; it allocates > memory for the dsp context). Malloc/free is AFAIK ~ 100's of cycles, > dwarfed by the table generation cost.
or local static array, once init'ed, this will be handled in a natural way. Superior to the malloc solution, and IMHO is fine. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel