Hi all, After extensive amounts of refactoring and iteration on the design and API, and the implementation of an x86 SIMD backend, I'm happy to present the revised version of my ongoing swscale rewrite. Now with 100% less reliance on compiler autovectorization.
As before, I recommend (re)reading the design document to understand the motivation, structure and implementation details of this rewrite. At this point, I expect the major API and internal organization decisions to remain stable. I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D: All formats: - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x - multi thread: Overall speedup=2.607x faster, min=0.112x max=254.738x "Common" formats: (referenced >100 times in FFmpeg source code) - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x - multi thread: Overall speedup=2.870x faster, min=0.715x max=21.983x However, the main goal of this rewrite is not to improve performance, but to improve the maintainability, extensibility and correctness of the code. Most of the slowdowns for "common" formats are due to increased correctness (e.g. accurate rounding and dithering), and not the result of a regression per se. All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete coverage of the x86 SIMD. Notably, this currently affects bit packed formats (e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which some of the existing routines take advantage of) While I will continue working on this and expanding coverage to all remaining operations, I felt that now is a good point in time to get some code review and feedback regardless. I would especially appreciate code review of the x86 SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing x86 assembly code. doc/APIchanges | 3 + doc/scaler.texi | 3 + doc/swscale-v2.txt | 344 +++++++++++++++++++++++++++ libswscale/Makefile | 9 + libswscale/format.c | 945 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- libswscale/format.h | 29 ++- libswscale/graph.c | 151 ++++++++---- libswscale/graph.h | 37 ++- libswscale/ops.c | 850 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/ops.h | 263 +++++++++++++++++++++ libswscale/ops_backend.c | 101 ++++++++ libswscale/ops_backend.h | 181 ++++++++++++++ libswscale/ops_chain.c | 291 +++++++++++++++++++++++ libswscale/ops_chain.h | 108 +++++++++ libswscale/ops_internal.h | 103 ++++++++ libswscale/ops_optimizer.c | 810 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/ops_tmpl_common.c | 176 ++++++++++++++ libswscale/ops_tmpl_float.c | 255 ++++++++++++++++++++ libswscale/ops_tmpl_int.c | 609 +++++++++++++++++++++++++++++++++++++++++++++++ libswscale/options.c | 1 + libswscale/swscale.h | 7 + libswscale/tests/swscale.c | 11 +- libswscale/version.h | 2 +- libswscale/x86/Makefile | 3 + libswscale/x86/ops.c | 735 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ libswscale/x86/ops_common.asm | 208 ++++++++++++++++ libswscale/x86/ops_float.asm | 376 +++++++++++++++++++++++++++++ libswscale/x86/ops_int.asm | 882 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/checkasm/Makefile | 8 +- tests/checkasm/checkasm.c | 4 +- tests/checkasm/checkasm.h | 26 +- tests/checkasm/sw_ops.c | 748 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 32 files changed, 8206 insertions(+), 73 deletions(-) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".