ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3
22:18:23 2025 +0100| [f9b8f30680b6107fe5c32f3ba5115359368ec234] | committer:
Martin Storsjö
avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12}
This patch replaces integer widening with halving addition, and
multi-step "emulated"
ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3
22:32:55 2025 +0100| [71a91485fa05c1ca478de153d8839794606f8edc] | committer:
Martin Storsjö
avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr
This patch replaces blocks of instructions performing rounding and
widening shifts with
ffmpeg | branch: master | Krzysztof Pyrkosz | Mon Mar 3
22:00:23 2025 +0100| [d765e5f043d981294303fe210d643c5156efeeb3] | committer:
Martin Storsjö
swscale/aarch64: dotprod implementation of rgba32_to_Y
The idea is to split the 16 bit coefficients into lower and upper half,
invoke udot for
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Feb 28
22:21:50 2025 +0100| [e8d4c559871ef93fc94a8efb8144f1738eba4c62] | committer:
Martin Storsjö
avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon
Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only
ffmpeg | branch: master | Krzysztof Pyrkosz | Sat Mar 1
13:59:00 2025 +0100| [38929b824bcc4b3307af3e0711c5c03b823a83e3] | committer:
Martin Storsjö
swscale/aarch64: Refactor hscale_16_to_15__fs_4
This patch removes the use of stack for temporary state and replaces
interleaved ld4 loads with
ffmpeg | branch: master | Krzysztof Pyrkosz via ffmpeg-devel
| Tue Feb 25 21:45:56 2025 +0100|
[9993a64d7bcd5baa730d1ff95f6ab4d5a49af369] | committer: Lynne
avutil/aarch64/tx_float_neon.S: clean up FFT4_X2
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commi
ffmpeg | branch: master | Krzysztof Pyrkosz | Thu Feb 13
20:02:29 2025 +0100| [b92577405b40b6eb5ecf0036060e34e0219da1e3] | committer:
Martin Storsjö
swscale/aarch64/rgb2rgb_neon: Implemented {yuyv, uyvy}toyuv{420, 422}
A78:
uyvytoyuv420_neon:6112.5 ( 6.96x
ffmpeg | branch: master | Krzysztof Pyrkosz | Tue Feb 11
22:43:11 2025 +0100| [64107e22f545d3899f9270751531997734d89a3d] | committer:
Martin Storsjö
swscale/aarch64/rgb24toyv12: skip early right shift by 2
It's a minor improvement that shaves off 5-8% from the execution time.
Inste
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Feb 7
20:42:11 2025 +0100| [9fb97215dfb2f1933cc2b959f29734a0671323eb] | committer:
Martin Storsjö
avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon
This change removes one extra floating point operation and simplifies
load
ffmpeg | branch: master | Krzysztof Pyrkosz | Tue Jan 28
19:01:33 2025 +0100| [c85a748979db507d619ac10d74832d3e33635942] | committer:
Martin Storsjö
swscale/aarch64/rgb2rgb: Implemented NEON shuf routines
The key idea is to pass the pre-generated tables to the TBL instruction
and churn
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Jan 31
22:20:03 2025 +0100| [e25a19fc7cfff1243bcc12c50a0f2fb026362df2] | committer:
Martin Storsjö
swscale/aarch64/output.S: refactor ff_yuv2plane1_8_neon
The benchmarks (before vs after) were gathered using
./tests/checkasm/checkasm --test
ffmpeg | branch: master | Krzysztof Pyrkosz | Fri Jan 24
19:58:26 2025 +0100| [83e4b068d9c49ae8af890c152e9e61320a835681] | committer:
Martin Storsjö
avcodec/aarch64/aacencdsp: NEON implementation
This patch supplies handwritten NEON code for AAC.
The benchmarks below were collected by
12 matches
Mail list logo