Hi, Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder.
However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. // Martin Martin Storsjö (21): aarch64: hevc: Reorder a misplaced function init line aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 aarch64: hevc: Split the epel_*_hv functions into two parts aarch64: hevc: Reorder epel_hv functions to prepare for templating aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 aarch64: hevc: Split the qpel_*_hv functions into two parts aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions aarch64: hevc: Reorder qpel_hv functions to prepare for templating aarch64: hevc: Produce plain neon versions of qpel_hv aarch64: hevc: Produce plain neon versions of qpel_uni_hv aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv aarch64: hevc: Produce plain neon versions of qpel_bi_hv libavcodec/aarch64/hevcdsp_epel_neon.S | 1529 +++++++++++------ libavcodec/aarch64/hevcdsp_init_aarch64.c | 96 +- libavcodec/aarch64/hevcdsp_qpel_neon.S | 1804 +++++++++++++-------- 3 files changed, 2291 insertions(+), 1138 deletions(-) -- 2.39.3 (Apple Git-146) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".