On Sat, Mar 19, 2016 at 2:36 AM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: > On Sat, Mar 19, 2016 at 3:27 AM, Ganesh Ajjanagadde <gajja...@gmail.com> > wrote: >> Yields speedup in quantize_bands, and non-negligible speedup in aac encoding >> overall. >> >> Sample benchmark (Haswell, -march=native + GCC): >> new: >> [...] >> 553 decicycles in quantize_bands, 2097136 runs, 16 skips9x >> 554 decicycles in quantize_bands, 4194266 runs, 38 skips8x >> 559 decicycles in quantize_bands, 8388534 runs, 74 skips7x >> >> old: >> [...] >> 711 decicycles in quantize_bands, 2097140 runs, 12 skips7x >> 713 decicycles in quantize_bands, 4194277 runs, 27 skips4x >> 715 decicycles in quantize_bands, 8388538 runs, 70 skips3x >> >> old: >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.58s user 0.01s system >> 99% cpu 4.590 total >> >> new: >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.54s user 0.02s system >> 99% cpu 4.566 total >> >> Signed-off-by: Ganesh Ajjanagadde <gajja...@gmail.com> >> --- >> libavcodec/aacenc_utils.h | 33 +++++++++++++++++++++++++-------- >> 1 file changed, 25 insertions(+), 8 deletions(-) >> >> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h >> index 38636e5..0203b6e 100644 >> --- a/libavcodec/aacenc_utils.h >> +++ b/libavcodec/aacenc_utils.h >> @@ -62,18 +62,35 @@ static inline int quant(float coef, const float Q, const >> float rounding) >> return sqrtf(a * sqrtf(a)) + rounding; >> } >> >> +static inline float minf(float x, float y) { >> + return x < y ? x : y; >> +} >> + > > Thats exactly what the FFMIN macro expands to, whats the reason for > introducing this function?
There was some compilation difference, in particular this was faster. No idea why, maybe some repeated evaluation of qc + rounding? > >> static inline void quantize_bands(int *out, const float *in, const float >> *scaled, >> int size, float Q34, int is_signed, int >> maxval, >> const float rounding) >> { >> - int i; >> - for (i = 0; i < size; i++) { >> - float qc = scaled[i] * Q34; >> - int tmp = (int)FFMIN(qc + rounding, (float)maxval); >> - if (is_signed && in[i] < 0.0f) { >> - tmp = -tmp; >> - } >> - out[i] = tmp; >> + for (int i = 0; i < size; i+=4) { >> + float qc0 = scaled[i ] * Q34; >> + float qc1 = scaled[i+1] * Q34; >> + float qc2 = scaled[i+2] * Q34; >> + float qc3 = scaled[i+3] * Q34; >> + int tmp0 = minf(qc0 + rounding, maxval); >> + int tmp1 = minf(qc1 + rounding, maxval); >> + int tmp2 = minf(qc2 + rounding, maxval); >> + int tmp3 = minf(qc3 + rounding, maxval); >> + if (is_signed && in[i ] < 0.0f) >> + tmp0 = -tmp0; >> + if (is_signed && in[i+1] < 0.0f) >> + tmp1 = -tmp1; >> + if (is_signed && in[i+2] < 0.0f) >> + tmp2 = -tmp2; >> + if (is_signed && in[i+3] < 0.0f) >> + tmp3 = -tmp3; >> + out[i ] = tmp0; >> + out[i+1] = tmp1; >> + out[i+2] = tmp2; >> + out[i+3] = tmp3; >> } >> } >> > > Is size always a multiple of 4? It is as far as I could see, usage via num_coeffs is derived from swb_offset values, which are all multiples of 4. In particular, I stuck in an assert and ran fate as well to make sure. If it helps, I can add an av_assert2 for this assumption. > > - Hendrik > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel