On 9/8/2017 6:29 PM, Michael Niedermayer wrote: > Speeds code up from 50sec to 15sec > > Fixes Timeout > Fixes: 3242/clusterfuzz-testcase-5811951672229888 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc> > --- > libavcodec/scpr.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c > index 37fbe7a106..2ef63a7bf8 100644 > --- a/libavcodec/scpr.c > +++ b/libavcodec/scpr.c > @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void > *data, int *got_frame, > return ret; > > for (y = 0; y < avctx->height; y++) { > - for (x = 0; x < avctx->width * 4; x++) { > + if (!(((uintptr_t)dst) & 7)) { > + uint64_t *dst64 = (uint64_t *)dst; > + int w = avctx->width>>1; > + for (x = 0; x < w; x++) { > + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version shifting four bytes at a time used otherwise? That's how we do almost everywhere else. The chances for anyone bothering writing simd for this decoder are almost none, so adding C optimized loops is ok in this case. > + } > + x *= 8; > + } else > + x = 0; How does this fix the timeout if the new code is only run if the pointer is eight byte aligned? (or four once you add that). > + for (; x < avctx->width * 4; x++) { > dst[x] = dst[x] << 3; > } > dst += frame->linesize[0]; > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel