Re: [FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for pix_abs16_y2

Martin Storsjö Thu, 04 Aug 2022 01:08:18 -0700

On Mon, 25 Jul 2022, Hubert Mazur wrote:

Provide optimized implementation of pix_abs16_y2 function for arm64.


Performance comparison tests are shown below.
pix_abs_0_2_c: 308.5
pix_abs_0_2_neon: 39.2

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  3 +
libavcodec/aarch64/me_cmp_neon.S         | 73 ++++++++++++++++++++++++
2 files changed, 76 insertions(+)

Please do the same optimizations as done for pix_abs_xy2 inb46de9aba436dea0cff76f3ed0f7c98448367fd0,68a03f64240dcbe408c3fd43d1071a105508a588 and4136405c86162063e45d40d55c9985f348d4ea0a for this function too("aarch64: me_cmp: Interleave some of the loads in ff_pix_abs16_xy2_neon","aarch64: me_cmp: Switch from uabd to uabal in ff_pix_abs16_xy2_neon" and"aarch64: me_cmp: Don't do uaddlv once per iteration").


// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for pix_abs16_y2

Reply via email to