On Thu, Jan 7, 2016 at 3:13 AM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: > On Mon, Oct 12, 2015 at 1:21 AM, James Almer <jamr...@gmail.com> wrote: >> On 10/11/2015 3:11 PM, Ronald S. Bultje wrote: >>> Hi, >>> >>> On Sun, Oct 11, 2015 at 1:17 PM, James Almer <jamr...@gmail.com> wrote: >>> >>>> On 10/11/2015 4:31 AM, Paul B Mahol wrote: >>>>> On 10/11/15, James Almer <jamr...@gmail.com> wrote: >>>>>> Signed-off-by: James Almer <jamr...@gmail.com> >>>>>> --- >>>>>> libavfilter/x86/vf_w3fdif.asm | 16 +++++++--------- >>>>>> 1 file changed, 7 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/libavfilter/x86/vf_w3fdif.asm >>>> b/libavfilter/x86/vf_w3fdif.asm >>>>>> index f02319b..f2001a4 100644 >>>>>> --- a/libavfilter/x86/vf_w3fdif.asm >>>>>> +++ b/libavfilter/x86/vf_w3fdif.asm >>>>>> @@ -103,13 +103,11 @@ REP_RET >>>>>> >>>>>> %if ARCH_X86_64 >>>>>> >>>>>> -cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line, in_lines_cur0, >>>>>> in_lines_adj0, coef, linesize >>>>>> +cglobal w3fdif_simple_high, 5, 9, 8, 0, work_line, in_lines_cur0, >>>>>> in_lines_adj0, coef, linesize >>>>>> movq m2, [coefq] >>>>>> DEFINE_ARGS work_line, in_lines_cur0, in_lines_adj0, >>>> in_lines_cur1, >>>>>> linesize, offset, in_lines_cur2, in_lines_adj1, in_lines_adj2 >>>>>> - SPLATW m0, m2, 0 >>>>>> - SPLATW m1, m2, 1 >>>>>> + pshufd m0, m2, q0000 >>>>>> SPLATW m2, m2, 2 >>>>>> - SBUTTERFLY wd, 0, 1, 7 >>>>>> pxor m7, m7 >>>>>> mov offsetq, 0 >>>>>> mov in_lines_cur2q, [in_lines_cur0q+gprsize*2] >>>>>> @@ -124,23 +122,23 @@ cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line, >>>>>> in_lines_cur0, in_lines_adj0, >>>>>> movh m4, [in_lines_cur1q+offsetq] >>>>>> punpcklbw m3, m7 >>>>>> punpcklbw m4, m7 >>>>>> - SBUTTERFLY wd, 3, 4, 8 >>>>>> + SBUTTERFLY wd, 3, 4, 1 >>>>>> pmaddwd m3, m0 >>>>>> - pmaddwd m4, m1 >>>>>> + pmaddwd m4, m0 >>>>>> movh m5, [in_lines_adj0q+offsetq] >>>>>> movh m6, [in_lines_adj1q+offsetq] >>>>>> punpcklbw m5, m7 >>>>>> punpcklbw m6, m7 >>>>>> - SBUTTERFLY wd, 5, 6, 8 >>>>>> + SBUTTERFLY wd, 5, 6, 1 >>>>>> pmaddwd m5, m0 >>>>>> - pmaddwd m6, m1 >>>>>> + pmaddwd m6, m0 >>>>>> paddd m3, m5 >>>>>> paddd m4, m6 >>>>>> movh m5, [in_lines_cur2q+offsetq] >>>>>> movh m6, [in_lines_adj2q+offsetq] >>>>>> punpcklbw m5, m7 >>>>>> punpcklbw m6, m7 >>>>>> - SBUTTERFLY wd, 5, 6, 8 >>>>>> + SBUTTERFLY wd, 5, 6, 1 >>>>>> pmaddwd m5, m2 >>>>>> pmaddwd m6, m2 >>>>>> paddd m3, m5 >>>>>> -- >>>>>> 2.6.0 >>>>>> >>>>>> _______________________________________________ >>>>>> ffmpeg-devel mailing list >>>>>> ffmpeg-devel@ffmpeg.org >>>>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>>>>> >>>>> >>>>> Cant this now be used on x32? >>>> >>> >>> Add to the data pointers directly (in_lines_cur0q and work_lineq). Then sub >>> all other curXq/adjXq from cur0q (on 32bit only) before the loop and you >>> have to adds (on 32bit) instead of one (on 64bit), but one reg less >>> (offset), making it 7, which means it works. >>> >>> Ronald >> >> Ah, like it's being done in PACK_6CH from swr's audio_convert.asm >> For complex_high some stack ab/use will be needed (see PACK_8CH), but it >> should >> be doable. >> This way w3fdif will be able to fully dethrone yadif :P > > Are you still working on w3fdif_simple_high for 32bit? > I would be interested in that. Otherwise I might try to do it myself, > although it sounds like a lot of #if'ery
I was bored and it was easy, so patch coming up anyway! _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel