> > > > On 18 November 2017 at 17:35, Rafal Dabrowa <fatwild...@gmail.com> wrote: > > This is a proposal of performance optimizations for 8-bit > hevc video decoding on aarch64 platform with neon (simd) extension. > > I'm testing my optimizations on NanoPi M3 device. I'm using > mainly "Big Buck Bunny" video file in format 1280x720 for testing. > The video file was pulled from libde265.org page, see > http://www.libde265.org/hevc-bitstreams/bbb-1280x720-cfg06.mkv > The movie duration is 00:10:34.53. > > Overall performance gain is about 2x. Without optimizations the movie > playback stops in practice after a few seconds. With > optimizations the file is played smoothly 99% of the time. > > For performance testing the following command was used: > > time ./ffmpeg -hide_banner -i ~/bbb-1280x720-cfg06.mkv -f yuv4mpegpipe > - >/dev/null > > The video file was pre-read before test to minimize disk reads during > testing. > Program execution time without optimization was as follows: > > real 11m48.576s > user 43m8.111s > sys 0m12.469s > > Execution time with optimizations: > > real 6m17.046s > user 21m19.792s > sys 0m14.724s > > > The patch contains optimizations for most heavily used qpel, epel, sao and > idct > functions. Among the functions provided for optimization there are two > intensively used, but not optimized in this patch: > hevc_v_loop_filter_luma_8 > and hevc_h_loop_filter_luma_8. I have no idea how they could be optimized > hence I leaved them without optimizations. > > > > Signed-off-by: Rafal Dabrowa <fatwild...@gmail.com> > --- > libavcodec/aarch64/Makefile | 5 + > libavcodec/aarch64/hevcdsp_epel_8.S | 3949 ++++++++++++++++++++ > libavcodec/aarch64/hevcdsp_idct_8.S | 1980 ++++++++++ > libavcodec/aarch64/hevcdsp_init_aarch64.c | 170 + > libavcodec/aarch64/hevcdsp_qpel_8.S | 5666 > +++++++++++++++++++++++++++++ > libavcodec/aarch64/hevcdsp_sao_8.S | 166 + > libavcodec/hevcdsp.c | 2 + > libavcodec/hevcdsp.h | 1 + > 8 files changed, 11939 insertions(+) > create mode 100644 libavcodec/aarch64/hevcdsp_epel_8.S > create mode 100644 libavcodec/aarch64/hevcdsp_idct_8.S > create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c > create mode 100644 libavcodec/aarch64/hevcdsp_qpel_8.S > create mode 100644 libavcodec/aarch64/hevcdsp_sao_8.S
Very nice. The way we test SIMD is to put START_TIMER("function_name"); and STOP_TIMER; (they're located in libavutil/timer.h) around where the function gets called in the C code, then we do a run with the C code (no SIMD) and a separate run with whatever SIMD optimizations we're implementing. We take the last printed value of both runs and that's what's used to measure speedup. I don't think there's a need to split the patch into multiple patches for each idividual version though yet, that's usually only done if some function's C implementation is faster than the SIMD code. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel