Hi,
Le 26/03/2016 13:05, Matthieu Bouron a écrit :
On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <mich...@niedermayer.cc
>wrote:
>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> >From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
> >
> >---
> > libswscale/arm/yuv2rgb_neon.S | 89
>++++++++++++-------------------------------
> > 1 file changed, 24 insertions(+), 65 deletions(-)
>
>breaks build
>
> make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
>--enable-cross-compile && make -j12
>
>CC libavutil/arm/float_dsp_init_arm.o
>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `subeq r6,r6,r0'
>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
>instruction should be in IT block -- `addne r6,r7'
>
[...]
Patch updated with the relevant it instructions added. It still does build
on my rpi2 setup but is not tested on the same setup as yours.
Can you confirm it builds/works on your setup ?
If it works, i will send an updated version of the next patch (07/10) to
resolve the conflicts.
Matthieu
0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch
From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron<matthieu.bou...@stupeflix.com>
Date: Wed, 23 Mar 2016 11:26:13 +0000
Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
for the yuv420p and nv{12,21} formats
---
libswscale/arm/yuv2rgb_neon.S | 92 +++++++++++++------------------------------
1 file changed, 27 insertions(+), 65 deletions(-)
diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..6aeccae 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,16 +105,6 @@
compute_16px r2, d14, d15, \ofmt
.endm
-.macro process_2l_16px ofmt
- compute_premult d28, d29, d30, d31
-
- vld1.8 {q7}, [r4]! @ first
line of luma
- compute_16px r2, d14, d15, \ofmt
-
- vld1.8 {q7}, [r12]! @
second line of luma
- compute_16px r11, d14, d15, \ofmt
-.endm
-
.macro load_args_nvx
push {r4-r12, lr}
vpush {q4-q7}
@@ -127,13 +117,9 @@
ldr r10,[sp, #128] @ r10
= y_coeff
vdup.16 d0, r10 @ d0
= y_coeff
vld1.16 {d1}, [r8] @ d1
= *table
- add r11, r2, r3 @ r11 =
dst + linesize (dst2)
- add r12, r4, r5 @ r12 =
srcY + linesizeY (srcY2)
Nit: this lets r11 and r12 unused by the NV conversions. It should be
possible not to push/pop them
If not (which I would certainly understand), what would you think about
moving the registers save out of the 'load_args_*' macro?
It seems weird to have all the push/vpush that are not factored, and the
pop/vpop that is done in only one place, at the end of each function.
[snip]
Looks good to me anyway (as well as the remainder of the series).
--
Ben
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel