On Wed, Mar 30, 2016 at 11:36:34PM +0200, Benoit Fouet wrote: > Hi, Hi Benoit,
> > Le 26/03/2016 13:05, Matthieu Bouron a écrit : > >On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer <mich...@niedermayer.cc > >>>wrote: > >>>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote: > >>>> >From: Matthieu Bouron<matthieu.bou...@stupeflix.com> > >>>> > > >>>> >--- > >>>> > libswscale/arm/yuv2rgb_neon.S | 89 > >>>++++++++++++------------------------------- > >>>> > 1 file changed, 24 insertions(+), 65 deletions(-) > >>> > >>>breaks build > >>> > >>> make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/ > >>>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon > >>>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux > >>>--enable-cross-compile && make -j12 > >>> > >>>CC libavutil/arm/float_dsp_init_arm.o > >>>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages: > >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > >>>instruction should be in IT block -- `subeq r6,r6,r0' > >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > >>>instruction should be in IT block -- `addne r6,r7' > >>> > >[...] > > > >Patch updated with the relevant it instructions added. It still does build > >on my rpi2 setup but is not tested on the same setup as yours. > >Can you confirm it builds/works on your setup ? > > > >If it works, i will send an updated version of the next patch (07/10) to > >resolve the conflicts. > > > >Matthieu > > > >0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch > > > > > > From 7b3affff405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001 > >From: Matthieu Bouron<matthieu.bou...@stupeflix.com> > >Date: Wed, 23 Mar 2016 11:26:13 +0000 > >Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time > > for the yuv420p and nv{12,21} formats > > > >--- > > libswscale/arm/yuv2rgb_neon.S | 92 > > +++++++++++++------------------------------ > > 1 file changed, 27 insertions(+), 65 deletions(-) > > > >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S > >index ef7b0a6..6aeccae 100644 > >--- a/libswscale/arm/yuv2rgb_neon.S > >+++ b/libswscale/arm/yuv2rgb_neon.S > >@@ -105,16 +105,6 @@ > > compute_16px r2, d14, d15, \ofmt > > .endm > >-.macro process_2l_16px ofmt > >- compute_premult d28, d29, d30, d31 > >- > >- vld1.8 {q7}, [r4]! @ > >first line of luma > >- compute_16px r2, d14, d15, \ofmt > >- > >- vld1.8 {q7}, [r12]! @ > >second line of luma > >- compute_16px r11, d14, d15, \ofmt > >-.endm > >- > > .macro load_args_nvx > > push {r4-r12, lr} > > vpush {q4-q7} > >@@ -127,13 +117,9 @@ > > ldr r10,[sp, #128] @ > > r10 = y_coeff > > vdup.16 d0, r10 @ > > d0 = y_coeff > > vld1.16 {d1}, [r8] @ > > d1 = *table > >- add r11, r2, r3 @ > >r11 = dst + linesize (dst2) > >- add r12, r4, r5 @ > >r12 = srcY + linesizeY (srcY2) > > Nit: this lets r11 and r12 unused by the NV conversions. It should be > possible not to push/pop them > If not (which I would certainly understand), what would you think about > moving the registers save out of the 'load_args_*' macro? > It seems weird to have all the push/vpush that are not factored, and the > pop/vpop that is done in only one place, at the end of each function. Thanks for the review, I unfortunately dropped this part of the patch set, processing only one line at a time proved to be slower on devices other than the rpi2. (I will keep your remark in mind if I ever switch back to processing only one line at a time for all formats). The v2 patch set is in reply of the following thread: https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192272.html Would you mind taking a look at it ? Matthieu [...] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel