On Sun, 2 Jul 2023, John Cox wrote:
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
Differences from v1:
.align 16 corrected to .balign 16
SXTW tolower
Mac ABI (hopefully) fixed
V register pop/push macroed & prettified
John Cox (15):
avfilter/vf_bwdif: Add outline for aarch neon functions
avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
avfilter/vf_bwdif: Export C filter_intra
avfilter/vf_bwdif: Add neon for filter_intra
tests/checkasm: Add test for vf_bwdif filter_intra
avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
avfilter/vf_bwdif: Export C filter_edge
avfilter/vf_bwdif: Add neon for filter_edge
tests/checkasm: Add test for vf_bwdif filter_edge
avfilter/vf_bwdif: Export C filter_line
avfilter/vf_bwdif: Add neon for filter_line
avfilter/vf_bwdif: Add a filter_line3 method for optimisation
avfilter/vf_bwdif: Add neon for filter_line3
tests/checkasm: Add test for vf_bwdif filter_line3
avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines
Overall, I'd suggest squashing/reordering the patches like this:
- tests/checkasm: Add test for vf_bwdif filter_intra
- avfilter/vf_bwdif: Add neon for filter_intra
(With the preceding patches squashed. For extra common macros, only add
the ones you use in this patch here.)
- tests/checkasm: Add test for vf_bwdif filter_edge
- avfilter/vf_bwdif: Add neon for filter_edge (with other dependencies
squashed)
- avfilter/vf_bwdif: Add neon for filter_line
- avfilter/vf_bwdif: Add a filter_line3 method for optimisation
+ checkasm test squashed
- avfilter/vf_bwdif: Add neon for filter_line3
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".