Hi,

Le 26/03/2016 13:05, Matthieu Bouron a écrit :
On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <mich...@niedermayer.cc
>wrote:
>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> >From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
> >
> >---
> >  libswscale/arm/yuv2rgb_neon.S | 89
>++++++++++++-------------------------------
> >  1 file changed, 24 insertions(+), 65 deletions(-)
>
>breaks build
>
>  make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
>--enable-cross-compile && make -j12
>
>CC      libavutil/arm/float_dsp_init_arm.o
>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `subeq r6,r6,r0'
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `addne r6,r7'
>
[...]

Patch updated with the relevant it instructions added. It still does build
on my rpi2 setup but is not tested on the same setup as yours.
Can you confirm it builds/works on your setup ?

If it works, i will send an updated version of the next patch (07/10) to
resolve the conflicts.

Matthieu

0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch


 From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
Date: Wed, 23 Mar 2016 11:26:13 +0000
Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
  for the yuv420p and nv{12,21} formats

---
  libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
  1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..6aeccae 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,16 +105,6 @@
      compute_16px        r2, d14, d15, \ofmt
  .endm
-.macro process_2l_16px ofmt
-    compute_premult     d28, d29, d30, d31
-
-    vld1.8              {q7}, [r4]!                                    @ first 
line of luma
-    compute_16px        r2, d14, d15, \ofmt
-
-    vld1.8              {q7}, [r12]!                                   @ 
second line of luma
-    compute_16px        r11, d14, d15, \ofmt
-.endm
-
  .macro load_args_nvx
      push                {r4-r12, lr}
      vpush               {q4-q7}
@@ -127,13 +117,9 @@
      ldr                 r10,[sp, #128]                                 @ r10 
= y_coeff
      vdup.16             d0, r10                                        @ d0  
= y_coeff
      vld1.16             {d1}, [r8]                                     @ d1  
= *table
-    add                 r11, r2, r3                                    @ r11 = 
dst + linesize (dst2)
-    add                 r12, r4, r5                                    @ r12 = 
srcY + linesizeY (srcY2)

Nit: this lets r11 and r12 unused by the NV conversions. It should be possible not to push/pop them If not (which I would certainly understand), what would you think about moving the registers save out of the 'load_args_*' macro? It seems weird to have all the push/vpush that are not factored, and the pop/vpop that is done in only one place, at the end of each function.

[snip]

Looks good to me anyway (as well as the remainder of the series).

--
Ben

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to