On 8/17/2021 4:25 PM, Niklas Haas wrote:
From: Niklas Haas <g...@haasn.dev>

This could arguably also be a vf, but I decided to put it here since
decoders are technically required to apply film grain during the output
step, and I would rather want to avoid requiring users insert the
correct film grain synthesis filter on their own.

The code, while in C, is written in a way that unrolls/vectorizes fairly
well under -O3, and is reasonably cache friendly. On my CPU, a single
thread pushes about 400 FPS at 1080p.

Apart from hand-written assembly, one possible avenue of improvement
would be to change the access order to compute the grain row-by-row
rather than in 8x8 blocks. This requires some redundant PRNG calls, but
would make the algorithm more cache-oblivious.

The implementation has been written to the wording of SMPTE RDD 5-2006
as faithfully as I can manage. However, apart from passing a visual
inspection, no guarantee of correctness can be made due to the lack of
any publicly available reference implementation against which to
compare it.

Signed-off-by: Niklas Haas <g...@haasn.dev>
---
  libavcodec/Makefile |   1 +
  libavcodec/h274.c   | 811 ++++++++++++++++++++++++++++++++++++++++++++
  libavcodec/h274.h   |  52 +++
  3 files changed, 864 insertions(+)
  create mode 100644 libavcodec/h274.c
  create mode 100644 libavcodec/h274.h

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 9a6adb9903..21739b4064 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -42,6 +42,7 @@ OBJS = ac3_parser.o                                           
          \
         dirac.o                                                          \
         dv_profile.o                                                     \
         encode.o                                                         \
+       h274.o                                                           \
         imgconvert.o                                                     \
         jni.o                                                            \
         mathtables.o                                                     \
diff --git a/libavcodec/h274.c b/libavcodec/h274.c
new file mode 100644
index 0000000000..0efc00ca1d
--- /dev/null
+++ b/libavcodec/h274.c
@@ -0,0 +1,811 @@
+/*
+ * H.274 film grain synthesis
+ * Copyright (c) 2021 Niklas Haas <ffm...@haasn.xyz>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * H.274 film grain synthesis.
+ * @author Niklas Haas <ffm...@haasn.xyz>
+ */
+
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
+
+#include "h274.h"
+
+// The code in this file has a lot of loops that vectorize very well, this is
+// about a 40% speedup for no obvious downside.
+#pragma GCC optimize("tree-vectorize")

Will this not break compilation with msvc and such?

Also, tree vectorization is know to cause issues in old GCC versions, and even recent ones. I don't know if this is worth the potential problems it could introduce, but i guess it can be done until someone writes simd.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to