On Sun, 2 Jul 2023, John Cox wrote:

Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy

Differences from v1:
.align 16 corrected to .balign 16
SXTW tolower
Mac ABI (hopefully) fixed
V register pop/push macroed & prettified

John Cox (15):
 avfilter/vf_bwdif: Add outline for aarch neon functions
 avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
 avfilter/vf_bwdif: Export C filter_intra
 avfilter/vf_bwdif: Add neon for filter_intra
 tests/checkasm: Add test for vf_bwdif filter_intra
 avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
 avfilter/vf_bwdif: Export C filter_edge
 avfilter/vf_bwdif: Add neon for filter_edge
 tests/checkasm: Add test for vf_bwdif filter_edge
 avfilter/vf_bwdif: Export C filter_line
 avfilter/vf_bwdif: Add neon for filter_line
 avfilter/vf_bwdif: Add a filter_line3 method for optimisation
 avfilter/vf_bwdif: Add neon for filter_line3
 tests/checkasm: Add test for vf_bwdif filter_line3
 avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

Overall, I'd suggest squashing/reordering the patches like this:

- tests/checkasm: Add test for vf_bwdif filter_intra
- avfilter/vf_bwdif: Add neon for filter_intra
  (With the preceding patches squashed. For extra common macros, only add
  the ones you use in this patch here.)
- tests/checkasm: Add test for vf_bwdif filter_edge
- avfilter/vf_bwdif: Add neon for filter_edge (with other dependencies
  squashed)
- avfilter/vf_bwdif: Add neon for filter_line
- avfilter/vf_bwdif: Add a filter_line3 method for optimisation
  + checkasm test squashed
- avfilter/vf_bwdif: Add neon for filter_line3

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to