On 4/12/2017 9:39 PM, Thomas Mundt wrote: >>>> Michael Niedermayer <mich...@niedermayer.cc> schrieb am Mi, 12.4.2017: > On Thu, Mar 30, 2017 at 12:21:58AM +0000, Thomas Mundt wrote: >>>>> Lou Logan <lou at lrcd.com> schrieb am Do, 30.3.2017: >>>> On Mon, 13 Mar 2017 16:23:46 +0000 (UTC) >>>> Thomas Mundt <loudmax-at-yahoo.de at ffmpeg.org> wrote: >>>> >>>> [...] >>>>> index 09ca4d3..0b5b858 100644 >>>>> --- a/libavfilter/vf_tinterlace.c >>>>> +++ b/libavfilter/vf_tinterlace.c >>>> [...] >>>>> +static void lowpass_line_complex_c(uint8_t *dstp, ptrdiff_t width, const >>>>> uint8_t *srcp, >>>>> + ptrdiff_t mref, ptrdiff_t pref) >>>>> >>>> >>>> Trailing whitespace should be avoided. It prevents the patch from being >>>> applied. >>> >>> Oh, didn´t notice. Thanks. >>> New patch set attached. >> >> [...] >>> --- a/libavfilter/x86/vf_interlace.asm >>> +++ b/libavfilter/x86/vf_interlace.asm >>> @@ -28,33 +28,28 @@ SECTION_RODATA >>> SECTION .text >>> >>> %macro LOWPASS_LINE 0 >>> -cglobal lowpass_line, 5, 5, 7 >>> - add r0, r1 >>> - add r2, r1 > > [...] >>> - add r1, 2*mmsize >>> - jl .loop >>> + add dstq, 2*mmsize >>> + add srcq, 2*mmsize >>> + sub hd, 2*mmsize >>> + jg .loop >> >> this increases the number of instructions in the inner loop by 2 > > James Almer suggested to change the function prototype. Which was easy in c, > but for simd this is the best I can do.
I didn't check, but I think the reason i told you to change the prototype here was to share the function pointer with lowpass_line_complex, so you can do something like if (tinterlace->flags & TINTERLACE_FLAG_VLPF) tinterlace->lowpass_line = lowpass_line_c; else if (tinterlace->flags & TINTERLACE_FLAG_CVLPF) tinterlace->lowpass_line = lowpass_line_complex_c; instead of adding a new one to InterlaceContext and TInterlaceContext. Otherwise you wouldn't really gain much changing the prototype for linear here. > I asked for help a month ago but get no reply. Can you tell me how to avoid > this? Yes, sorry, i kinda lost track of this since for some reason your emails start a new thread each instead of showing up as a reply. You just need to turn mref and pref into the equivalent of the old srcp_above and srcp_below pointers, like so: diff --git a/libavfilter/x86/vf_interlace.asm b/libavfilter/x86/vf_interlace.asm index f70c700965..8a0dd3bdea 100644 --- a/libavfilter/x86/vf_interlace.asm +++ b/libavfilter/x86/vf_interlace.asm @@ -28,32 +28,32 @@ SECTION_RODATA SECTION .text %macro LOWPASS_LINE 0 -cglobal lowpass_line, 5, 5, 7 - add r0, r1 - add r2, r1 - add r3, r1 - add r4, r1 - neg r1 +cglobal lowpass_line, 5, 5, 7, dst, h, src, mref, pref + add dstq, hq + add srcq, hq + add mrefq, srcq + add prefq, srcq + neg hq pcmpeqb m6, m6 .loop: - mova m0, [r3+r1] - mova m1, [r3+r1+mmsize] - pavgb m0, [r4+r1] - pavgb m1, [r4+r1+mmsize] + mova m0, [mrefq+hq] + mova m1, [mrefq+hq+mmsize] + pavgb m0, [prefq+hq] + pavgb m1, [prefq+hq+mmsize] pxor m0, m6 pxor m1, m6 - pxor m2, m6, [r2+r1] - pxor m3, m6, [r2+r1+mmsize] + pxor m2, m6, [srcq+hq] + pxor m3, m6, [srcq+hq+mmsize] pavgb m0, m2 pavgb m1, m3 pxor m0, m6 pxor m1, m6 - mova [r0+r1], m0 - mova [r0+r1+mmsize], m1 + mova [dstq+hq], m0 + mova [dstq+hq+mmsize], m1 - add r1, 2*mmsize + add hq, 2*mmsize jl .loop REP_RET %endmacro > >> also can you add a fate test for the -1 2 6 2-1 filter ? > > Sure. I never wrote a fate test and I´m off for a couple of days, so this > could take some time. Can you give me a hint or an example? > > Regards, > Thomas > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel