[FFmpeg-cvslog] avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12}

2025-03-07 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3 22:18:23 2025 +0100| [f9b8f30680b6107fe5c32f3ba5115359368ec234] | committer: Martin Storsjö avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} This patch replaces integer widening with halving addition, and multi-step "emulated"

[FFmpeg-cvslog] avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr

2025-03-04 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3 22:32:55 2025 +0100| [71a91485fa05c1ca478de153d8839794606f8edc] | committer: Martin Storsjö avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr This patch replaces blocks of instructions performing rounding and widening shifts with

[FFmpeg-cvslog] swscale/aarch64: dotprod implementation of rgba32_to_Y

2025-03-04 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3 22:00:23 2025 +0100| [d765e5f043d981294303fe210d643c5156efeeb3] | committer: Martin Storsjö swscale/aarch64: dotprod implementation of rgba32_to_Y The idea is to split the 16 bit coefficients into lower and upper half, invoke udot for

[FFmpeg-cvslog] avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon

2025-03-01 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Feb 28 22:21:50 2025 +0100| [e8d4c559871ef93fc94a8efb8144f1738eba4c62] | committer: Martin Storsjö avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only

[FFmpeg-cvslog] swscale/aarch64: Refactor hscale_16_to_15__fs_4

2025-03-01 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Sat Mar 1 13:59:00 2025 +0100| [38929b824bcc4b3307af3e0711c5c03b823a83e3] | committer: Martin Storsjö swscale/aarch64: Refactor hscale_16_to_15__fs_4 This patch removes the use of stack for temporary state and replaces interleaved ld4 loads with

[FFmpeg-cvslog] avutil/aarch64/tx_float_neon.S: clean up FFT4_X2

2025-02-28 Thread Krzysztof Pyrkosz via ffmpeg-devel
ffmpeg | branch: master | Krzysztof Pyrkosz via ffmpeg-devel | Tue Feb 25 21:45:56 2025 +0100| [9993a64d7bcd5baa730d1ff95f6ab4d5a49af369] | committer: Lynne avutil/aarch64/tx_float_neon.S: clean up FFT4_X2 > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commi

[FFmpeg-cvslog] swscale/aarch64/rgb2rgb_neon: Implemented {yuyv, uyvy}toyuv{420, 422}

2025-02-17 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Thu Feb 13 20:02:29 2025 +0100| [b92577405b40b6eb5ecf0036060e34e0219da1e3] | committer: Martin Storsjö swscale/aarch64/rgb2rgb_neon: Implemented {yuyv, uyvy}toyuv{420, 422} A78: uyvytoyuv420_neon:6112.5 ( 6.96x

[FFmpeg-cvslog] swscale/aarch64/rgb24toyv12: skip early right shift by 2

2025-02-17 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Tue Feb 11 22:43:11 2025 +0100| [64107e22f545d3899f9270751531997734d89a3d] | committer: Martin Storsjö swscale/aarch64/rgb24toyv12: skip early right shift by 2 It's a minor improvement that shaves off 5-8% from the execution time. Inste

[FFmpeg-cvslog] avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon

2025-02-10 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Feb 7 20:42:11 2025 +0100| [9fb97215dfb2f1933cc2b959f29734a0671323eb] | committer: Martin Storsjö avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon This change removes one extra floating point operation and simplifies load

[FFmpeg-cvslog] swscale/aarch64/rgb2rgb: Implemented NEON shuf routines

2025-02-07 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Tue Jan 28 19:01:33 2025 +0100| [c85a748979db507d619ac10d74832d3e33635942] | committer: Martin Storsjö swscale/aarch64/rgb2rgb: Implemented NEON shuf routines The key idea is to pass the pre-generated tables to the TBL instruction and churn

[FFmpeg-cvslog] swscale/aarch64/output.S: refactor ff_yuv2plane1_8_neon

2025-02-07 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Jan 31 22:20:03 2025 +0100| [e25a19fc7cfff1243bcc12c50a0f2fb026362df2] | committer: Martin Storsjö swscale/aarch64/output.S: refactor ff_yuv2plane1_8_neon The benchmarks (before vs after) were gathered using ./tests/checkasm/checkasm --test

[FFmpeg-cvslog] avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 Thread Krzysztof Pyrkosz
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Jan 24 19:58:26 2025 +0100| [83e4b068d9c49ae8af890c152e9e61320a835681] | committer: Martin Storsjö avcodec/aarch64/aacencdsp: NEON implementation This patch supplies handwritten NEON code for AAC. The benchmarks below were collected by