Re: [FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y
On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: The idea is to split the 16 bit coefficients into lower and upper half, invoke udot for the lower half, shift by 8, and follow by udot for the upper half. Benchmark on A78: bgra_to_y_128_c: 682.0
[FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y
The idea is to split the 16 bit coefficients into lower and upper half, invoke udot for the lower half, shift by 8, and follow by udot for the upper half. Benchmark on A78: bgra_to_y_128_c: 682.0 ( 1.00x) bgra_to_y_128_neon: