On Fri, 1 Apr 2022, Martin Storsjö wrote:

On Thu, 31 Mar 2022, Ben Avison wrote:

The VC1 decoder was missing lots of important fast paths for Arm, especially
for 64-bit Arm. This submission fills in implementations for all functions
where a fast path already existed and the fallback C implementation was
taking 1% or more of the runtime, and adds a new fast path to permit
vc1_unescape_buffer() to be overridden.

I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
using `ffmpeg -i <bitstream> -f null -` for a couple of example streams:

Architecture:  AArch32    AArch32    AArch64    AArch64
Stream:        1          2          1          2
Before speed:  1.22x      0.82x      1.00x      0.67x
After speed:   1.31x      0.98x      1.39x      1.06x
Improvement:   7.4%       20%        39%        58%

`make fate` passes on both AArch32 and AArch64.

Changes in v2:

* Refactor checkasm tests to convert some macros into functions.
* Remove cast-to-void of checked_call.
* Limit 16-bit values in idctdsp checkasm test to +/-0x100.
* Reinstate ff_add_pixels_clamped_arm.
* Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
* Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
 filter, and adapt checkasm test not to test with tighter alignment than is
 encountered in normal use.
* Correct unescape buffer memcmp length.
* Update benchmarks for AArch64 idctdsp.

Thanks! From a quick readthrough, this version of the patchset seems good to me! I'll run it through some more testing, and push it if everything seems to work fine (tomorrow or so).

Pushed now - thanks for your contribution!

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to