On Wed, 25 May 2022, Swinney, Jonathan wrote:
This patch adds code to support specializations of the hscale function and adds
a specialization for filterSize == 4.
ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is
loading the data from src, this data is loaded a whole block ahead and stored
back to the stack to be loaded again with ld4. This arranges the data for most
efficient use of the vector instructions and removes the need for completion
adds at the end. The number of iterations of the C per iteration of the assembly
is increased from 4 to 8, but because of the prefetching, there must be a
special section without prefetching when dstW < 16.
This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where
previously fs=8 would have been required.
before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8
after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9
Signed-off-by: Jonathan Swinney <jswin...@amazon.com>
---
libswscale/aarch64/hscale.S | 172 ++++++++++++++++++++++++++++++++++-
libswscale/aarch64/swscale.c | 40 ++++++--
libswscale/utils.c | 2 +-
3 files changed, 203 insertions(+), 11 deletions(-)
-void ff_hscale_8_to_15_neon(SwsContext *c, int16_t *dst, int dstW,
- const uint8_t *src, const int16_t *filter,
- const int32_t *filterPos, int filterSize);
+#define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
+void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
+ SwsContext *c, int16_t *data, \
+ int dstW, const uint8_t *src, \
+ const int16_t *filter, \
+ const int32_t *filterPos, int
filterSize)
+#define SCALE_FUNCS(filter_n, opt) \
+ SCALE_FUNC(filter_n, 8, 15, opt);
+#define ALL_SCALE_FUNCS(opt) \
+ SCALE_FUNCS(4, opt); \
+ SCALE_FUNCS(8, opt); \
+ SCALE_FUNCS(X8, opt)
Here, you still declare the -8 function which no longer is implemented.
Other than that, this patch looks fine I think.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".