yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

Benoit Fouet Wed, 30 Mar 2016 14:37:03 -0700

Hi,

Le 26/03/2016 13:05, Matthieu Bouron a écrit :

On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <mich...@niedermayer.cc

>wrote:
>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:

> >From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
> >
> >---
> >  libswscale/arm/yuv2rgb_neon.S | 89

>++++++++++++-------------------------------

> >  1 file changed, 24 insertions(+), 65 deletions(-)

>
>breaks build
>
>  make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
>--enable-cross-compile && make -j12
>
>CC      libavutil/arm/float_dsp_init_arm.o
>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `subeq r6,r6,r0'
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `addne r6,r7'
>

[...]


Patch updated with the relevant it instructions added. It still does build
on my rpi2 setup but is not tested on the same setup as yours.
Can you confirm it builds/works on your setup ?

If it works, i will send an updated version of the next patch (07/10) to
resolve the conflicts.

Matthieu

0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch


 From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
Date: Wed, 23 Mar 2016 11:26:13 +0000
Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
  for the yuv420p and nv{12,21} formats

---
  libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
  1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..6aeccae 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,16 +105,6 @@
      compute_16px        r2, d14, d15, \ofmt
  .endm

-.macro process_2l_16px ofmt

-    compute_premult     d28, d29, d30, d31
-
-    vld1.8              {q7}, [r4]!                                    @ first 
line of luma
-    compute_16px        r2, d14, d15, \ofmt
-
-    vld1.8              {q7}, [r12]!                                   @ 
second line of luma
-    compute_16px        r11, d14, d15, \ofmt
-.endm
-
  .macro load_args_nvx
      push                {r4-r12, lr}
      vpush               {q4-q7}
@@ -127,13 +117,9 @@
      ldr                 r10,[sp, #128]                                 @ r10 
= y_coeff
      vdup.16             d0, r10                                        @ d0  
= y_coeff
      vld1.16             {d1}, [r8]                                     @ d1  
= *table
-    add                 r11, r2, r3                                    @ r11 = 
dst + linesize (dst2)
-    add                 r12, r4, r5                                    @ r12 = 
srcY + linesizeY (srcY2)

Nit: this lets r11 and r12 unused by the NV conversions. It should bepossible not to push/pop themIf not (which I would certainly understand), what would you think aboutmoving the registers save out of the 'load_args_*' macro?It seems weird to have all the push/vpush that are not factored, and thepop/vpop that is done in only one place, at the end of each function.


[snip]

Looks good to me anyway (as well as the remainder of the series).

--
Ben

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

Reply via email to