The following patchset aims to make bitexact the yuv->rgba armv7 neon code path with the aarch64 one. It also aims to make the two code bases as close as possible.
[PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path The current 32bit code path which is unused is removed. [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time The code process only one line at a time for the yuv420p,nv12 and nv21 formats with no regression in performance observed on a rpi2 (I've even observed a slight increase of performance for the nv12 and nv21 formats). [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its The last patch of the serie makes the code bitexact with the aarch64 version. The increase of precision (which introduces a performance loss) is compensated by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. ./ffmpeg_g -nostats -f lavfi -i testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f null - without patchset : [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 with patchset: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.018846 Matthieu _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel