aarch64: Add neon implementation for vsad_intra16

Martin Storsjö Sun, 04 Sep 2022 13:58:32 -0700

On Mon, 22 Aug 2022, Hubert Mazur wrote:

Provide optimized implementation for vsad_intra16 function for arm64.


Performance comparison tests are shown below.
- vsad_4_c: 177.2
- vsad_4_neon: 24.5

Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.

Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  3 ++
libavcodec/aarch64/me_cmp_neon.S         | 58 ++++++++++++++++++++++++
2 files changed, 61 insertions(+)

Same thing as for the others; keep the data for the previous row inregisters instead of loading it twice.


// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: Add neon implementation for vsad_intra16

Reply via email to