Hi all,

After extensive amounts of refactoring and iteration on the design and API,
and the implementation of an x86 SIMD backend, I'm happy to present the
revised version of my ongoing swscale rewrite. Now with 100% less reliance on
compiler autovectorization.

As before, I recommend (re)reading the design document to understand the
motivation, structure and implementation details of this rewrite. At this
point, I expect the major API and internal organization decisions to remain
stable.

I will preface with some benchmark figures, on my (new) AMD Ryzen 9 9950X3D:

All formats:
  - single thread: Overall speedup=2.109x faster, min=0.018x max=40.309x
  - multi thread:  Overall speedup=2.607x faster, min=0.112x max=254.738x

"Common" formats: (referenced >100 times in FFmpeg source code)
  - single thread: Overall speedup=2.797x faster, min=0.408x max=16.514x
  - multi thread:  Overall speedup=2.870x faster, min=0.715x max=21.983x

However, the main goal of this rewrite is not to improve performance, but to
improve the maintainability, extensibility and correctness of the code. Most of
the slowdowns for "common" formats are due to increased correctness (e.g.
accurate rounding and dithering), and not the result of a regression per se.

All of the remaining slowdowns (notably, the 0.1x cases) are due to incomplete
coverage of the x86 SIMD. Notably, this currently affects bit packed formats
(e.g. rgb8, rgb4). (I also did not yet incorporate any AVX-512 code, which
some of the existing routines take advantage of)

While I will continue working on this and expanding coverage to all remaining
operations, I felt that now is a good point in time to get some code review
and feedback regardless. I would especially appreciate code review of the x86
SIMD code inside libswscale/x86/ops_*.asm, as this is my first time writing
x86 assembly code.

 doc/APIchanges                |   3 +
 doc/scaler.texi               |   3 +
 doc/swscale-v2.txt            | 344 +++++++++++++++++++++++++++
 libswscale/Makefile           |   9 +
 libswscale/format.c           | 945 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 libswscale/format.h           |  29 ++-
 libswscale/graph.c            | 151 ++++++++----
 libswscale/graph.h            |  37 ++-
 libswscale/ops.c              | 850 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libswscale/ops.h              | 263 +++++++++++++++++++++
 libswscale/ops_backend.c      | 101 ++++++++
 libswscale/ops_backend.h      | 181 ++++++++++++++
 libswscale/ops_chain.c        | 291 +++++++++++++++++++++++
 libswscale/ops_chain.h        | 108 +++++++++
 libswscale/ops_internal.h     | 103 ++++++++
 libswscale/ops_optimizer.c    | 810 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libswscale/ops_tmpl_common.c  | 176 ++++++++++++++
 libswscale/ops_tmpl_float.c   | 255 ++++++++++++++++++++
 libswscale/ops_tmpl_int.c     | 609 
+++++++++++++++++++++++++++++++++++++++++++++++
 libswscale/options.c          |   1 +
 libswscale/swscale.h          |   7 +
 libswscale/tests/swscale.c    |  11 +-
 libswscale/version.h          |   2 +-
 libswscale/x86/Makefile       |   3 +
 libswscale/x86/ops.c          | 735 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libswscale/x86/ops_common.asm | 208 ++++++++++++++++
 libswscale/x86/ops_float.asm  | 376 +++++++++++++++++++++++++++++
 libswscale/x86/ops_int.asm    | 882 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/checkasm/Makefile       |   8 +-
 tests/checkasm/checkasm.c     |   4 +-
 tests/checkasm/checkasm.h     |  26 +-
 tests/checkasm/sw_ops.c       | 748 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 32 files changed, 8206 insertions(+), 73 deletions(-)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to