On 8/17/2021 4:25 PM, Niklas Haas wrote:
From: Niklas Haas <g...@haasn.dev>
This could arguably also be a vf, but I decided to put it here since
decoders are technically required to apply film grain during the output
step, and I would rather want to avoid requiring users insert the
correct film grain synthesis filter on their own.
The code, while in C, is written in a way that unrolls/vectorizes fairly
well under -O3, and is reasonably cache friendly. On my CPU, a single
thread pushes about 400 FPS at 1080p.
Apart from hand-written assembly, one possible avenue of improvement
would be to change the access order to compute the grain row-by-row
rather than in 8x8 blocks. This requires some redundant PRNG calls, but
would make the algorithm more cache-oblivious.
The implementation has been written to the wording of SMPTE RDD 5-2006
as faithfully as I can manage. However, apart from passing a visual
inspection, no guarantee of correctness can be made due to the lack of
any publicly available reference implementation against which to
compare it.
Signed-off-by: Niklas Haas <g...@haasn.dev>
---
libavcodec/Makefile | 1 +
libavcodec/h274.c | 811 ++++++++++++++++++++++++++++++++++++++++++++
libavcodec/h274.h | 52 +++
3 files changed, 864 insertions(+)
create mode 100644 libavcodec/h274.c
create mode 100644 libavcodec/h274.h
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 9a6adb9903..21739b4064 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -42,6 +42,7 @@ OBJS = ac3_parser.o
\
dirac.o \
dv_profile.o \
encode.o \
+ h274.o \
imgconvert.o \
jni.o \
mathtables.o \
diff --git a/libavcodec/h274.c b/libavcodec/h274.c
new file mode 100644
index 0000000000..0efc00ca1d
--- /dev/null
+++ b/libavcodec/h274.c
@@ -0,0 +1,811 @@
+/*
+ * H.274 film grain synthesis
+ * Copyright (c) 2021 Niklas Haas <ffm...@haasn.xyz>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * H.274 film grain synthesis.
+ * @author Niklas Haas <ffm...@haasn.xyz>
+ */
+
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
+
+#include "h274.h"
+
+// The code in this file has a lot of loops that vectorize very well, this is
+// about a 40% speedup for no obvious downside.
+#pragma GCC optimize("tree-vectorize")
Will this not break compilation with msvc and such?
Also, tree vectorization is know to cause issues in old GCC versions,
and even recent ones. I don't know if this is worth the potential
problems it could introduce, but i guess it can be done until someone
writes simd.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".