Re: [FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Sebastian Pop Thu, 03 Sep 2020 09:59:24 -0700

On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <mich...@niedermayer.cc>
wrote:


> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok


Hi, I'm looking for feedback from ARM maintainers on the attached patch.
Ok to commit the patch?

Thanks,
Sebastian

On Wed, Aug 19, 2020 at 1:37 PM Sebastian Pop <seb...@gmail.com> wrote:

> Thanks Michael for your feedback.
>
> On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <mich...@niedermayer.cc>
> wrote:
>
>> faster is better obviously, so if its tested with odd sizes and arm
>> developers had a chance to comment. it should be ok
>>
>>
> The current patch was tested with `make check` on Arm64 Graviton2.
> I also have tested randomly selected rescale factors, for example:
> ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> bench=start,scale=1023x42,bench=stop -f null -
>
>
>> one potential improvment is to use the unrolled code for odd width
>> too and use the non unrolled for the end
>>
>
> Done.  Please see the amended patch.
>
> Thanks,
> Sebastian
>

0001-aarch64-yuv2planeX-unroll-outer-loop-by-4-increases-.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Reply via email to