Re: [FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y

Martin Storsjö Tue, 04 Mar 2025 00:27:58 -0800

On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

The idea is to split the 16 bit coefficients into lower and upper half,
invoke udot for the lower half, shift by 8, and follow by udot for the
upper half.

Benchmark on A78:
bgra_to_y_128_c:                                       682.0 ( 1.00x)
bgra_to_y_128_neon:                                    181.2 ( 3.76x)
bgra_to_y_128_dotprod:                                 117.8 ( 5.79x)
bgra_to_y_1080_c:                                     5742.5 ( 1.00x)
bgra_to_y_1080_neon:                                  1472.5 ( 3.90x)
bgra_to_y_1080_dotprod:                                906.5 ( 6.33x)
bgra_to_y_1920_c:                                    10194.0 ( 1.00x)
bgra_to_y_1920_neon:                                  2589.8 ( 3.94x)
bgra_to_y_1920_dotprod:                               1573.8 ( 6.48x)
---
libswscale/aarch64/input.S   | 88 ++++++++++++++++++++++++++++++++++++
libswscale/aarch64/swscale.c | 17 +++++++
2 files changed, 105 insertions(+)


LGTM, thanks, I pushed this one now.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y

Reply via email to