On Wed, Feb 25, 2015 at 11:12:23AM +0000, Tomperi Seppo wrote: > 17/02/15 12:44, "Michael Niedermayer" <mich...@niedermayer.cc>: > > >On Tue, Feb 17, 2015 at 07:33:04AM +0000, Tomperi Seppo wrote: > >> > >> > On 16 Feb 2015, at 19:54, Michael Niedermayer > >><mich...@niedermayer.cc> wrote: > >> > > >> > On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote: > >> >> More NEON optimizations for testing. fate-hevc passes on Tegra K1, > >>but these haven't been tested for NEON clobbering. > >> >> > >> >> -Seppo > >> >> > >> >> ________________________________________ > >> >> From: Tomperi Seppo > >> >> Sent: Monday, February 16, 2015 1:30 PM > >> >> To: Michael Niedermayer > >> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; > >>Mickaël Raulet > >> >> Subject: RE: [FFmpeg-devel] DSP function ARM NEON patches for hevc > >> >> > >> >> Hi Michael, > >> >> > >> >> Here is a totally shot in a dark fix attempt for NEON register > >>clobbering for deblocking. Could you test it with qemu and check if it > >>works. > >> >> > >> >> > >> >> -Seppo > >> >> > >> >> ________________________________________ > >> >> From: Michael Niedermayer [mich...@niedermayer.cc] > >> >> Sent: Monday, February 16, 2015 3:28 AM > >> >> To: Tomperi Seppo > >> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; > >>Mickaël Raulet > >> >> Subject: Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc > >> >> > >> >> Hi > >> >> > >> >> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote: > >> >>> Hi! > >> >>> > >> >>> The reason is chroma deblocking which is using q4 without pushing > >>it to stack. :/ > >> >>> Unfortunately I am in Geneve this week and don't have ARM linux > >>board with me so it is not easy to test. > >> >>> > >> >>> Mickael Raulet: maybe guys at INSA could run tests this week if I > >>make a fix? Could you ask? > >> >> > >> >> If they cant, then i probably can test it too if its a patch which > >> >> applies cleanly to ffmpeg and testing fate-hevc with > >> >> --enable-neon-clobber-test under qemu is what is needed > >> >> i could test on a arm board too if needed > >> >> > >> >> > >> >>> > >> >>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They > >>pass fate-hevc on Jetson TK1, but should be iOS and clobber checked. > >> >>> > >> >>> -Seppo > >> >>> > >> >>> > >> >>> ________________________________________ > >> >>> From: Michael Niedermayer [michae...@gmx.at] > >> >>> Sent: Friday, February 13, 2015 5:38 PM > >> >>> To: FFmpeg development discussions and patches > >> >>> Cc: Tomperi Seppo; Mickaël Raulet > >> >>> Subject: Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc > >> >>> > >> >>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote: > >> >>>> Michael, > >> >>>> > >> >>>> Please find some commits that can be cherry picked from > >> >>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch > >> >>>> > >> >>> > >> >>>> Optimized deblocking filter (8bits only) > >> >>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8 > >> >>> > >> >>> this breaks the neon clobber test see: > >> >>> > >>fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-c > >>ortexa8-clobber > >> >>> > >> >>> [...] > >> >>> -- > >> >>> Michael GnuPG fingerprint: > >>9FF2128B147EF6730BADF133611EC787040B0FAB > >> >>> > >> >>> The worst form of inequality is to try to make unequal things equal. > >> >>> -- Aristotle > >> >>> > >> >> > >> >> -- > >> >> Michael GnuPG fingerprint: > >>9FF2128B147EF6730BADF133611EC787040B0FAB > >> >> > >> >> Opposition brings concord. Out of discord comes the fairest harmony. > >> >> -- Heraclitus > >> > > >> >> Makefile | 3 > >> >> hevcdsp_init_neon.c | 159 ++++++++ > >> >> hevcdsp_qpel_neon.S | 999 > >>++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> 3 files changed, 1160 insertions(+), 1 deletion(-) > >> >> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c > >>0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch > >> >> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00 > >>2001 > >> >> From: Seppo Tomperi <seppo.tomp...@vtt.fi> > >> >> Date: Wed, 11 Feb 2015 10:20:26 +0000 > >> >> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions > >> >> > >> >> --- > >> >> libavcodec/arm/Makefile | 3 +- > >> >> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++ > >> >> libavcodec/arm/hevcdsp_qpel_neon.S | 999 > >>+++++++++++++++++++++++++++++++++++++ > >> >> 3 files changed, 1160 insertions(+), 1 deletion(-) > >> >> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S > >> > > >> > > >> > seems to fail building: > >> > > >> > libavformat/utils.o > >> > CC libavcodec/arm/hevcdsp_init_neon.o > >> > AS libavcodec/arm/hevcdsp_qpel_neon.o > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages: > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- > >>`vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vld1.32 {},[r2],r3' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vld1.32 {},[r2],r3' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vld1.32 {},[r2],r3' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- > >>`vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vst1.32 {},[r0],r1' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vst1.32 {},[r0],r1' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or > >>quad precision register expected -- `vst1.32 {},[r0],r1' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- > >>`vld1.32 {d1[0]d2},[r2]' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or > >>quad precision register expected -- `vld1.32 {},[r2]' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- > >>`vst1.32 {d1[0]d2},[r0]' > >> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or > >>quad precision register expected -- `vst1.32 {},[r0]' > >> > make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1 > >> > make: *** Waiting for unfinished jobs.... > >> > > >> > > >> > >> These macros compiled for me with Jetson TK1 toolchain and with latest > >>GAS preprocessor, so I thought they are finally ok. > >> But it looks like passing register lists to macros is not handled well > >>by all preprocessors. > > > >plain "arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3" > >here, with no preprocessor > > > > > >> > >> These are quite simple functions copying varying width blocks of pixels > >>using NEON. I could either write out the macros (lots of almost > >>identical functions) or leave the optimisation out totally for now. Or > >>do you have any other ideas? > > > >the following seems to fix it, but i sure do not know why these 2 > >lines failed while the others do not seem to fail > >adding , to all works as well > > > >diff --git a/libavcodec/arm/hevcdsp_qpel_neon.S > >b/libavcodec/arm/hevcdsp_qpel_neon.S > >index 14116a6..7b0df2e 100644 > >--- a/libavcodec/arm/hevcdsp_qpel_neon.S > >+++ b/libavcodec/arm/hevcdsp_qpel_neon.S > >@@ -989,9 +989,9 @@ function > >ff_hevc_put_qpel_uw_pixels_w\width\()_neon_8, export=1 > > endfunc > > .endm > > > >-put_qpel_uw_pixels 4 d0[0] d0[1] d1[0] d1[1] > >+put_qpel_uw_pixels 4 d0[0], d0[1], d1[0], d1[1] > > put_qpel_uw_pixels 8 d0 d1 d2 d3 > >-put_qpel_uw_pixels_m 12 d0 d1[0] d2 d3[0] > >+put_qpel_uw_pixels_m 12 d0, d1[0], d2, d3[0] > > put_qpel_uw_pixels 16 q0 q1 q2 q3 > > put_qpel_uw_pixels 24 d0-d2 d3-d5 d16-d18 d19-d21 > > put_qpel_uw_pixels 32 q0-q1 q2-q3 q8-q9 q10-q11 > > > >[...] > > Same patch, but with comma separators for these macros.
applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The worst form of inequality is to try to make unequal things equal. -- Aristotle
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel