On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron <matthieu.bou...@gmail.com > wrote:
> The following patchset aims to make bitexact the yuv->rgba armv7 neon code > path > with the aarch64 one. It also aims to make the two code bases as close as > possible. > > [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path > > The current 32bit code path which is unused is removed. > > [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time > > The code process only one line at a time for the yuv420p,nv12 and nv21 > formats > with no regression in performance observed on a rpi2 (I've even observed a > slight increase of performance for the nv12 and nv21 formats). > > [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its > > The last patch of the serie makes the code bitexact with the aarch64 > version. > The increase of precision (which introduces a performance loss) is > compensated > by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. > > ./ffmpeg_g -nostats -f lavfi -i > testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f > null - > > without patchset : > [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 > > with patchset: > [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 I've managed tu run the code on a beagle bone black board, here are the results: nv12->bgra without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600 min:0.011513 with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659 max:0.034427 min:0.013411 with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288 min:0.012523 yuv420p->bgra without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866 min:0.012945 with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 avg:0.015358 max:0.036186 min:0.015134 with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 avg:0.014784 max:0.035487 min:0.014568 So it looks like processing one line at a time as negative effect on performance on this board (as opposed to the rpi2). I'll try to keep the two line processing code and post some result (so we can decide, which version to choose). Matthieu _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel