Posted an unrolled version in a new thread, alongside a few patches by Christophe.
On 30/01/15 3:50 PM, James Almer wrote: > Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard > Lepere. > 10/12bit yasm ports, refactoring and optimizations by James Almer > > Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U > > width 32 > 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips > 8585 decicycles in ff_hevc_sao_band_filter_8_sse2, 2048 runs, 0 skips > 4543 decicycles in ff_hevc_sao_band_filter_8_avx2, 2048 runs, 0 skips > > width 64 > 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips > 29366 decicycles in ff_hevc_sao_band_filter_8_sse2, 16384 runs, 0 skips > 15357 decicycles in ff_hevc_sao_band_filter_8_avx2, 16383 runs, 1 skips > > Signed-off-by: James Almer <jamr...@gmail.com> _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel