On Thu, Jun 20, 2019 at 09:43:46PM +0800, Jun Zhao wrote: > From: Jun Zhao <barryjz...@tencent.com> > > Remove unnecessary max value found, it's will improve the performance > about 10%. Used the test command like: > ffmpeg -i 1080P.mp4 -an -vf normalize -f null /dev/null, the FPS change > from 96fps to 107fps. > > Signed-off-by: Jun Zhao <barryjz...@tencent.com> > --- > libavfilter/vf_normalize.c | 7 +++---- > 1 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/libavfilter/vf_normalize.c b/libavfilter/vf_normalize.c > index 48eea59..40dc031 100644 > --- a/libavfilter/vf_normalize.c > +++ b/libavfilter/vf_normalize.c > @@ -143,14 +143,13 @@ static void normalize(NormalizeContext *s, AVFrame *in, > AVFrame *out) > min[c].in = max[c].in = in->data[0][s->co[c]]; > for (y = 0; y < in->height; y++) { > uint8_t *inp = in->data[0] + y * in->linesize[0]; > - uint8_t *outp = out->data[0] + y * out->linesize[0]; > for (x = 0; x < in->width; x++) { > for (c = 0; c < 3; c++) { > - min[c].in = FFMIN(min[c].in, inp[s->co[c]]); > - max[c].in = FFMAX(max[c].in, inp[s->co[c]]); > + uint8_t val = inp[s->co[c]]; > + if (val < min[c].in) min[c].in = val; > + else if (val > max[c].in) max[c].in = val; > } > inp += s->step; > - outp += s->step;
outp should just be removed, a patch for that certainly would be accepted. But I am very skeptical towards the rest of the change. In a way I would expect the compiler should be able to optimize this. However this code is clearly completely unoptimized, and thus I don't think making it more complex for a minor optimization that gives just 10% is worth it. A proper optimization would get rid of the s->co indirection in the inner loop - it really is not necessary for the limited format support the code has. This would then also allow for SIMD optimization, which should give a vastly larger speedup - and possibly the compiler would even be able to auto-vectorize. This applies even more to the processing step - the transform calculation done in SIMD is almost certain to be cheaper than a LUT lookup - at least if switch to fixed-point arithmetic instead of float. Side note: I haven't really figured out what the point of all this s->co complexity is supposed to be. The only thing the code seems to actually care is whether there is an alpha channel, and if so where it is. But the quickest path to a significant speedup would be to write a SIMD version of the normalize function specific to that format and the SIMD instruction set available and call it instead of normalize if applicable. I would expect that you should see at least a 3x speedup if using e.g. SSE2 (which is why I don't consider a 10% speedup worth changing the code for really). Best regards, Reimar Döffinger _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".