18 Aug 2021, 17:41 by jamr...@gmail.com: > On 8/17/2021 4:25 PM, Niklas Haas wrote: > >> From: Niklas Haas <g...@haasn.dev> >> >> This could arguably also be a vf, but I decided to put it here since >> decoders are technically required to apply film grain during the output >> step, and I would rather want to avoid requiring users insert the >> correct film grain synthesis filter on their own. >> >> The code, while in C, is written in a way that unrolls/vectorizes fairly >> well under -O3, and is reasonably cache friendly. On my CPU, a single >> thread pushes about 400 FPS at 1080p. >> >> Apart from hand-written assembly, one possible avenue of improvement >> would be to change the access order to compute the grain row-by-row >> rather than in 8x8 blocks. This requires some redundant PRNG calls, but >> would make the algorithm more cache-oblivious. >> >> The implementation has been written to the wording of SMPTE RDD 5-2006 >> as faithfully as I can manage. However, apart from passing a visual >> inspection, no guarantee of correctness can be made due to the lack of >> any publicly available reference implementation against which to >> compare it. >> >> Signed-off-by: Niklas Haas <g...@haasn.dev> >> --- >> libavcodec/Makefile | 1 + >> libavcodec/h274.c | 811 ++++++++++++++++++++++++++++++++++++++++++++ >> libavcodec/h274.h | 52 +++ >> 3 files changed, 864 insertions(+) >> create mode 100644 libavcodec/h274.c >> create mode 100644 libavcodec/h274.h >> >> diff --git a/libavcodec/Makefile b/libavcodec/Makefile >> index 9a6adb9903..21739b4064 100644 >> --- a/libavcodec/Makefile >> +++ b/libavcodec/Makefile >> @@ -42,6 +42,7 @@ OBJS = ac3_parser.o >> \ >> dirac.o \ >> dv_profile.o \ >> encode.o \ >> + h274.o \ >> imgconvert.o \ >> jni.o \ >> mathtables.o \ >> diff --git a/libavcodec/h274.c b/libavcodec/h274.c >> new file mode 100644 >> index 0000000000..0efc00ca1d >> --- /dev/null >> +++ b/libavcodec/h274.c >> @@ -0,0 +1,811 @@ >> +/* >> + * H.274 film grain synthesis >> + * Copyright (c) 2021 Niklas Haas <ffm...@haasn.xyz> >> + * >> + * This file is part of FFmpeg. >> + * >> + * FFmpeg is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU Lesser General Public >> + * License as published by the Free Software Foundation; either >> + * version 2.1 of the License, or (at your option) any later version. >> + * >> + * FFmpeg is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * Lesser General Public License for more details. >> + * >> + * You should have received a copy of the GNU Lesser General Public >> + * License along with FFmpeg; if not, write to the Free Software >> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 >> USA >> + */ >> + >> +/** >> + * @file >> + * H.274 film grain synthesis. >> + * @author Niklas Haas <ffm...@haasn.xyz> >> + */ >> + >> +#include "libavutil/avassert.h" >> +#include "libavutil/imgutils.h" >> + >> +#include "h274.h" >> + >> +// The code in this file has a lot of loops that vectorize very well, this >> is >> +// about a 40% speedup for no obvious downside. >> +#pragma GCC optimize("tree-vectorize") >> > > Will this not break compilation with msvc and such? > > Also, tree vectorization is know to cause issues in old GCC versions, and > even recent ones. I don't know if this is worth the potential problems it > could introduce, but i guess it can be done until someone writes simd. >
I really, really would rather not have any compiler hints at all. It's not like the function is incredibly slow without SIMD, and comparatively 40% speedup for a handwritten SIMD function is a failing grade for me, so I think we should leave it out. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".