This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the point where it can be argued that runtime initialization can always be done instead of hard-coded tables. The only cost is essentially a trivial increase in the stack size.
Even if one does not care about this, the patch also improves accuracy as detailed below. Performance: Benchmark obtained by looping 10^4 times over ff_aac_tableinit. Sample benchmark (x86-64, Haswell, GNU/Linux): old: 1295292 decicycles in ff_aac_tableinit, 512 runs, 0 skips 1275981 decicycles in ff_aac_tableinit, 1024 runs, 0 skips 1272932 decicycles in ff_aac_tableinit, 2048 runs, 0 skips 1262164 decicycles in ff_aac_tableinit, 4096 runs, 0 skips 1256720 decicycles in ff_aac_tableinit, 8192 runs, 0 skips new: 25691 decicycles in ff_aac_tableinit, 505 runs, 7 skips 25130 decicycles in ff_aac_tableinit, 1016 runs, 8 skips 25973 decicycles in ff_aac_tableinit, 2036 runs, 12 skips 25911 decicycles in ff_aac_tableinit, 4078 runs, 18 skips 25816 decicycles in ff_aac_tableinit, 8154 runs, 38 skips Accuracy: The previous code was resulting in needless loss of accuracy due to the pow being called in succession. As an illustration of this: ff_aac_pow34sf_tab[3] old : 0.000000000007598092294225 new : 0.000000000007598091426864 real: 0.000000000007598091778545 truncated to float old : 0.000000000007598092294225 new : 0.000000000007598091426864 real: 0.000000000007598091426864 showing that the old value was not correctly rounded. This affects a large number of elements of the array. Patch tested with FATE. Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> --- libavcodec/aac_tablegen.h | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/libavcodec/aac_tablegen.h b/libavcodec/aac_tablegen.h index 8b223f9..255723b 100644 --- a/libavcodec/aac_tablegen.h +++ b/libavcodec/aac_tablegen.h @@ -35,9 +35,43 @@ float ff_aac_pow34sf_tab[428]; av_cold void ff_aac_tableinit(void) { int i; + + /* 2^(i/16) for 0 <= i <= 15 */ + const double exp2_lut[] = { + 1.00000000000000000000, + 1.04427378242741384032, + 1.09050773266525765921, + 1.13878863475669165370, + 1.18920711500272106672, + 1.24185781207348404859, + 1.29683955465100966593, + 1.35425554693689272830, + 1.41421356237309504880, + 1.47682614593949931139, + 1.54221082540794082361, + 1.61049033194925430818, + 1.68179283050742908606, + 1.75625216037329948311, + 1.83400808640934246349, + 1.91520656139714729387, + }; + double t1 = 8.8817841970012523233890533447265625e-16; // 2^(-50) + double t2 = 3.63797880709171295166015625e-12; // 2^(-38) + int t1_inc_cur, t2_inc_cur; + int t1_inc_prev = 0; + int t2_inc_prev = 8; + for (i = 0; i < 428; i++) { - ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0); - ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0); + t1_inc_cur = 4 * (i % 4); + t2_inc_cur = (8 + 3*i) % 16; + if (t1_inc_cur < t1_inc_prev) + t1 *= 2; + if (t2_inc_cur < t2_inc_prev) + t2 *= 2; + ff_aac_pow2sf_tab[i] = t1 * exp2_lut[t1_inc_cur]; + ff_aac_pow34sf_tab[i] = t2 * exp2_lut[t2_inc_cur]; + t1_inc_prev = t1_inc_cur; + t2_inc_prev = t2_inc_cur; } } #endif /* CONFIG_HARDCODED_TABLES */ -- 2.6.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel