On Sat, 26 Apr 2025 19:41:04 +0200 Niklas Haas <ffm...@haasn.xyz> wrote: > Hi all, > > After extensive amounts of refactoring and iteration on the design and API, > and the implementation of an x86 SIMD backend, I'm happy to present the > revised version of my ongoing swscale rewrite. Now with 100% less reliance on > compiler autovectorization. > > As before, I recommend (re)reading the design document to understand the > motivation, structure and implementation details of this rewrite. At this > point, I expect the major API and internal organization decisions to remain > stable. > > I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D: > > All formats: > - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x > - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x > > "Common" formats: (referenced >100 times in FFmpeg source code) > - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x > - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x
Small update: I noticed that one code path was accidentally not enabled. I also implemented asm for the remaining bit-packed formats. After those two changes, the new numbers are: All formats: - single thread: Overall speedup=4.247x faster, min=0.177x max=224.809x - multi thread: Overall speedup=4.000x faster, min=0.256x max=968.725x "Common" formats: - single thread: Overall speedup=3.174x faster, min=0.596x max=12.616x - multi thread: Overall speedup=3.005x faster, min=0.617x max=14.739x > > However, the main goal of this rewrite is not to improve performance, but to > improve the maintainability, extensibility and correctness of the code. Most > of > the slowdowns for "common" formats are due to increased correctness (e.g. > accurate rounding and dithering), and not the result of a regression per se. > > All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete > coverage of the x86 SIMD. Notably, this currently affects bit packed formats > (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which > some of the existing routines take advantage of) > > While I will continue working on this and expanding coverage to all remaining > operations, I felt that now is a good point in time to get some code review > and feedback regardless. I would especially appreciate code review of the x86 > SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing > x86 assembly code. > > doc/APIchanges | 3 + > doc/scaler.texi | 3 + > doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++ > libswscale/Makefile | 9 + > libswscale/format.c | 945 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- > libswscale/format.h | 29 ++- > libswscale/graph.c | 151 ++++++++---- > libswscale/graph.h | 37 ++- > libswscale/ops.c | 850 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops.h | 263 +++++++++++++++++++++ > libswscale/ops_backend.c | 101 ++++++++ > libswscale/ops_backend.h | 181 ++++++++++++++ > libswscale/ops_chain.c | 291 +++++++++++++++++++++++ > libswscale/ops_chain.h | 108 +++++++++ > libswscale/ops_internal.h | 103 ++++++++ > libswscale/ops_optimizer.c | 810 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/ops_tmpl_common.c | 176 ++++++++++++++ > libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++ > libswscale/ops_tmpl_int.c | 609 > +++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/options.c | 1 + > libswscale/swscale.h | 7 + > libswscale/tests/swscale.c | 11 +- > libswscale/version.h | 2 +- > libswscale/x86/Makefile | 3 + > libswscale/x86/ops.c | 735 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > libswscale/x86/ops_common.asm | 208 ++++++++++++++++ > libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++ > libswscale/x86/ops_int.asm | 882 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/checkasm/Makefile | 8 +- > tests/checkasm/checkasm.c | 4 +- > tests/checkasm/checkasm.h | 26 +- > tests/checkasm/sw_ops.c | 748 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 32 files changed, 8206 insertions(+), 73 deletions(-) > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".