On Sat, Mar 19, 2016 at 5:35 AM, Rostislav Pehlivanov <atomnu...@gmail.com> wrote: > On 19 March 2016 at 05:12, Ganesh Ajjanagadde <gajja...@gmail.com> wrote: > >> It seems like in all usages, size is a multiple of 4. This is documented >> as an assert. >> >> Yields speedup in this function, and small speedup for aac encoding >> overall. >> >> Sample benchmark (Haswell, -march=native + GCC): >> old: >> [...] >> 1390 decicycles in abs_pow34_v, 127138 runs, 3934 skips63.1x >> 1385 decicycles in abs_pow34_v, 254191 runs, 7953 skips64.4x >> 1383 decicycles in abs_pow34_v, 508305 runs, 15983 skips65.3x >> >> new: >> [...] >> 1109 decicycles in abs_pow34_v, 127122 runs, 3950 skips61.2x >> 1107 decicycles in abs_pow34_v, 254177 runs, 7967 skips63.5x >> 1106 decicycles in abs_pow34_v, 508292 runs, 15996 skips65.3x >> >> old: >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.55s user 0.03s >> system 99% cpu 4.581 total >> new: >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.50s user 0.04s >> system 99% cpu 4.537 total >> >> Signed-off-by: Ganesh Ajjanagadde <gajja...@gmail.com> >> --- >> libavcodec/aacenc_utils.h | 24 +++++++++++++++--------- >> 1 file changed, 15 insertions(+), 9 deletions(-) >> >> > Are you sure that this speedup (and the other patch you posted) is real and > above the error? Did you do multiple runs to rule out that it was chance? > 0.04/0.05 second improvement on 5 seconds doesn't seem significant at all,
I am really sorry about these measurements, they were screwed up by a very recent regression on my laptop due to some package upgrade. Essentially, put it to suspend, restore, and the clock freq/cpu governor would downshift slightly, from 2.4 to 2.2 GHz base, no idea about the changes to the turbo freq. So please ignore these. However, here is a heuristic calculation of the impact: between 500,000 and 1,000,000 runs, 30 cycle speedup per run ~ 15-30 million cycles saved overall out of ~ 5 * 3 billion = 15 billion cycles. So it is near the 0.1% threshold, see below. > and we have to put the line on placebo speedups or enjoy the whole project > filling up with sphagetti code. > Although the decrease in decicycles for the function was nice, what matters > at the end is whether the speedup is enough to justify the extra code, Per doc/optimization.txt, aac is a widely used codec, so even a 0.1% improvement in aac is fair game for optimizations, assuming it is a small code change. Of course, one can debate whether this is small or not. I view it as simple and clean, others may disagree. > and > I have a suspicion that the compiler inlines and unrolls that function > anyway. Try putting __attribute__ ((noinline)) as an attribute to see if > that makes a difference. I'll have time to test later today. > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel