On Wed, Sep 17, 2014 at 01:14:33PM -0300, James Almer wrote: > On 17/09/14 9:07 AM, Michael Niedermayer wrote: > > On Wed, Sep 17, 2014 at 01:18:12PM +0200, Clément Bœsch wrote: > >> On Wed, Sep 17, 2014 at 11:41:32AM +0200, James Almer wrote: > >>> ffmpeg | branch: master | James Almer <jamr...@gmail.com> | Tue Sep 16 > >>> 21:41:47 2014 -0300| [0456d169c469a79e305813d14c873fe698c8c572] | > >>> committer: Michael Niedermayer > >>> > >>> x86/me_cmp: port mmxext and sse2 sad functions to yasm > >>> > >>> Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of > >>> sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext). > >>> Since the _xy2 versions are not bitexact, they are accordingly marked as > >>> approximate. > >>> > >>> Signed-off-by: James Almer <jamr...@gmail.com> > >>> Signed-off-by: Michael Niedermayer <michae...@gmx.at> > >>> > >>>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=0456d169c469a79e305813d14c873fe698c8c572 > >>> --- > >>> > >>> libavcodec/x86/me_cmp.asm | 330 > >>> ++++++++++++++++++++++++++++++++++++++++++ > >>> libavcodec/x86/me_cmp_init.c | 203 +++++++------------------- > >>> 2 files changed, 379 insertions(+), 154 deletions(-) > >>> > >>> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm > >>> index b0741f3..27176f4 100644 > >>> --- a/libavcodec/x86/me_cmp.asm > >>> +++ b/libavcodec/x86/me_cmp.asm > >>> @@ -23,6 +23,10 @@ > >>> > >>> %include "libavutil/x86/x86util.asm" > >>> > >>> +SECTION_RODATA > >>> + > >>> +cextern pb_1 > >>> + > >>> SECTION .text > >>> > >>> %macro DIFF_PIXELS_1 4 > >>> @@ -465,3 +469,329 @@ cglobal hf_noise%1, 3,3,0, pix1, lsize, h > >>> INIT_MMX mmx > >>> HF_NOISE 8 > >>> HF_NOISE 16 > >>> + > >>> +;--------------------------------------------------------------------------------------- > >>> +;int ff_sad_<opt>(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, int > >>> stride, int h); > >>> +;--------------------------------------------------------------------------------------- > >>> +INIT_MMX mmxext > >>> +cglobal sad8, 4, 4, 0, v, pix1, pix2, stride > >>> + movu m2, [pix2q] > >>> + movu m1, [pix2q+strideq] > >>> + psadbw m2, [pix1q] > >>> + psadbw m1, [pix1q+strideq] > >>> + paddw m2, m1 > >>> + > >>> +%rep 3 > >>> + lea pix1q, [pix1q+strideq*2] > >>> + lea pix2q, [pix2q+strideq*2] > >>> + movu m0, [pix2q] > >>> + movu m1, [pix2q+strideq] > >>> + psadbw m0, [pix1q] > >>> + psadbw m1, [pix1q+strideq] > >>> + paddw m2, m0 > >>> + paddw m2, m1 > >>> +%endrep > >>> + movd eax, m2 > >>> + RET > >>> + > >> > >> Sorry to notice that now but... what happened to the h parameter? > > > > i had missed that when reviewing > > > > fixed > > It's not needed. I purposely removed it and made it a fixed %rep since it's > supposedly > guaranteed to be 8. > Check the inline version it replaced.
hmm, we need a 8x4 for interlaced chroma motion estimation but maybe we just dont support interlaced chroma ME, i dont remember still i think its better if our code can handle that case so support for interlaced chroma ME ca be added without needing to update the asm [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB While the State exists there can be no freedom; when there is freedom there will be no State. -- Vladimir Lenin
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel