Hi, cursory review, I appreciate that this is work in progress so don't read too much into it.
General points: - I really like how you're doing a literal translation without trying to change it, this makes review easier and also reduces the likelihood that there's performance implications. - do you think a checkasm test makes sense? That would also make performance measuring easier. Code review: On Fri, May 12, 2017 at 10:27 AM, James Darnley <jdarn...@obe.tv> wrote: > +; Simple IDCT MMX > +; > +; Copyright (c) 2001, 2002 Michael Niedermayer <michae...@gmx.at> > Please add a line that you converted it from inline asm to x264asm syntax, we've done that in other places also. > +%macro DC_COND_IDCT 7 > +movq mm0, [blockq + %1] ; R4 R0 r4 r0 > +movq mm1, [blockq + %2] ; R6 R2 r6 r2 > +movq mm2, [blockq + %3] ; R3 R1 r3 r1 > +movq mm3, [blockq + %4] ; R7 R5 r7 r5 > Please use 4-space indentation. > + %%9: > +; :: "r" (block), "r" (temp), "r" (coeffs) > +; NAMED_CONSTRAINTS_ADD(wm1010,d40000) > +; : "%eax" > +%endmacro > The inline asm bits (middle 3 lines) can be removed (yay!). Rest is fine. I am assuming that the binary size will grow slightly because the idct is now inlined in the put/add (and duplicated between mmx/sse2), but no performance implication (or possibly slight improvement because of the inline). If that's correct, I'm OK with this. Are you planning to write an actual SSE2 version of the IDCT? (No pressure, just wondering.) Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel