On Fri, 1 Apr 2022, Martin Storsjö wrote:
On Thu, 31 Mar 2022, Ben Avison wrote:
The VC1 decoder was missing lots of important fast paths for Arm,
especially
for 64-bit Arm. This submission fills in implementations for all functions
where a fast path already existed and the fallback C implementation was
taking 1% or more of the runtime, and adds a new fast path to permit
vc1_unescape_buffer() to be overridden.
I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
using `ffmpeg -i <bitstream> -f null -` for a couple of example streams:
Architecture: AArch32 AArch32 AArch64 AArch64
Stream: 1 2 1 2
Before speed: 1.22x 0.82x 1.00x 0.67x
After speed: 1.31x 0.98x 1.39x 1.06x
Improvement: 7.4% 20% 39% 58%
`make fate` passes on both AArch32 and AArch64.
Changes in v2:
* Refactor checkasm tests to convert some macros into functions.
* Remove cast-to-void of checked_call.
* Limit 16-bit values in idctdsp checkasm test to +/-0x100.
* Reinstate ff_add_pixels_clamped_arm.
* Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
* Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
filter, and adapt checkasm test not to test with tighter alignment than is
encountered in normal use.
* Correct unescape buffer memcmp length.
* Update benchmarks for AArch64 idctdsp.
Thanks! From a quick readthrough, this version of the patchset seems good to
me! I'll run it through some more testing, and push it if everything seems to
work fine (tomorrow or so).
Pushed now - thanks for your contribution!
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".