lcldec: Optimize YUV422 case

Michael Niedermayer Sun, 28 Jul 2019 09:47:01 -0700

On Sun, Jul 28, 2019 at 11:06:16AM -0300, James Almer wrote:
> On 7/28/2019 8:56 AM, Michael Niedermayer wrote:
> > On Sun, Jul 28, 2019 at 12:45:36AM +0200, Reimar Döffinger wrote:
> >>
> >>
> >> On 28.07.2019, at 00:31, Michael Niedermayer <mich...@niedermayer.cc> 
> >> wrote:
> >>
> >>> This merges several byte operations and avoids some shifts inside the loop
> >>>
> >>> Improves: Timeout (330sec -> 134sec)
> >>> Improves: 
> >>> 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
> >>>
> >>> Found-by: continuous fuzzing process 
> >>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >>> Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc>
> >>> ---
> >>> libavcodec/lcldec.c | 10 +++++-----
> >>> 1 file changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> >>> index 104defa5f5..c3787b3cbe 100644
> >>> --- a/libavcodec/lcldec.c
> >>> +++ b/libavcodec/lcldec.c
> >>> @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void 
> >>> *data, int *got_frame, AVPac
> >>>         break;
> >>>     case IMGTYPE_YUV422:
> >>>         for (row = 0; row < height; row++) {
> >>> -            for (col = 0; col < width - 3; col += 4) {
> >>> +            for (col = 0; col < (width - 2)>>1; col += 2) {
> >>>                 memcpy(y_out + col, encoded, 4);
> >>>                 encoded += 4;
> >>> -                u_out[ col >> 1     ] = *encoded++ + 128;
> >>> -                u_out[(col >> 1) + 1] = *encoded++ + 128;
> >>> -                v_out[ col >> 1     ] = *encoded++ + 128;
> >>> -                v_out[(col >> 1) + 1] = *encoded++ + 128;
> >>> +                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> >>> +                encoded += 2;
> >>> +                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> >>> +                encoded += 2;
> >>
> >> Huh? Surely the pixel stride used for y_out still needs to be double of 
> >> the u/v one?
> > 
> >> I suspect doing only the AV_RN16/xor optimization might be best, the one 
> >> shift saved seems not worth the risk/complexity...
> > 
> > if you want i can remove the shift change ?
> > with the fixed shift change its 155sec, if i remove the shift optimization 
> > its 170sec
> > 
> > patch for the 155 case below:
> > 
> > commit 56998b7d57a2cd0ed7f53981c50e76fd419cd86f (HEAD)
> > Author: Michael Niedermayer <mich...@niedermayer.cc>
> > Date:   Sat Jul 27 22:46:34 2019 +0200
> > 
> >     avcodec/lcldec: Optimize YUV422 case
> >     
> >     This merges several byte operations and avoids some shifts inside the 
> > loop
> >     
> >     Improves: Timeout (330sec -> 155sec)
> >     Improves: 
> > 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
> >     
> >     Found-by: continuous fuzzing process 
> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> >     Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc>
> > 
> > diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> > index 104defa5f5..9e018ff5a9 100644
> > --- a/libavcodec/lcldec.c
> > +++ b/libavcodec/lcldec.c
> > @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void 
> > *data, int *got_frame, AVPac
> >          break;
> >      case IMGTYPE_YUV422:
> >          for (row = 0; row < height; row++) {
> > -            for (col = 0; col < width - 3; col += 4) {
> > -                memcpy(y_out + col, encoded, 4);
> > +            for (col = 0; col < (width - 2)>>1; col += 2) {
> > +                memcpy(y_out + 2 * col, encoded, 4);
> >                  encoded += 4;
> > -                u_out[ col >> 1     ] = *encoded++ + 128;
> > -                u_out[(col >> 1) + 1] = *encoded++ + 128;
> > -                v_out[ col >> 1     ] = *encoded++ + 128;
> > -                v_out[(col >> 1) + 1] = *encoded++ + 128;
> > +                AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> > +                AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> > +                encoded += 2;
> >              }
> >              y_out -= frame->linesize[0];
> >              u_out -= frame->linesize[1];
> > [...]
> 
> As others pointed before, this kind of optimization is usually meant for
> the SIMD implementations and not the C boilerplate/reference. So
> prioritize readability above speed if possible when choosing which
> version to apply.


I think its not a big difference, a shift of width vs. a shift of col
so ill go with what was faster in this testcase but iam happy to
do something else if people prefer

Thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf

signature.asc
Description: PGP signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/lcldec: Optimize YUV422 case

Reply via email to