On 19 March 2016 at 05:12, Ganesh Ajjanagadde <gajja...@gmail.com> wrote:
> It seems like in all usages, size is a multiple of 4. This is documented > as an assert. > > Yields speedup in this function, and small speedup for aac encoding > overall. > > Sample benchmark (Haswell, -march=native + GCC): > old: > [...] > 1390 decicycles in abs_pow34_v, 127138 runs, 3934 skips63.1x > 1385 decicycles in abs_pow34_v, 254191 runs, 7953 skips64.4x > 1383 decicycles in abs_pow34_v, 508305 runs, 15983 skips65.3x > > new: > [...] > 1109 decicycles in abs_pow34_v, 127122 runs, 3950 skips61.2x > 1107 decicycles in abs_pow34_v, 254177 runs, 7967 skips63.5x > 1106 decicycles in abs_pow34_v, 508292 runs, 15996 skips65.3x > > old: > ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.55s user 0.03s > system 99% cpu 4.581 total > new: > ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.50s user 0.04s > system 99% cpu 4.537 total > > Signed-off-by: Ganesh Ajjanagadde <gajja...@gmail.com> > --- > libavcodec/aacenc_utils.h | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > Are you sure that this speedup (and the other patch you posted) is real and above the error? Did you do multiple runs to rule out that it was chance? 0.04/0.05 second improvement on 5 seconds doesn't seem significant at all, and we have to put the line on placebo speedups or enjoy the whole project filling up with sphagetti code. Although the decrease in decicycles for the function was nice, what matters at the end is whether the speedup is enough to justify the extra code, and I have a suspicion that the compiler inlines and unrolls that function anyway. Try putting __attribute__ ((noinline)) as an attribute to see if that makes a difference. I'll have time to test later today. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel