Re: [FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y

2025-03-04 Thread Martin Storsjö
On Mon, 3 Mar 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote: The idea is to split the 16 bit coefficients into lower and upper half, invoke udot for the lower half, shift by 8, and follow by udot for the upper half. Benchmark on A78: bgra_to_y_128_c: 682.0

[FFmpeg-devel] [PATCH v2] swscale/aarch64: dotprod implementation of rgba32_to_Y

2025-03-03 Thread Krzysztof Pyrkosz via ffmpeg-devel
The idea is to split the 16 bit coefficients into lower and upper half, invoke udot for the lower half, shift by 8, and follow by udot for the upper half. Benchmark on A78: bgra_to_y_128_c: 682.0 ( 1.00x) bgra_to_y_128_neon: