Hi,

cursory review, I appreciate that this is work in progress so don't read
too much into it.

General points:
- I really like how you're doing a literal translation without trying to
change it, this makes review easier and also reduces the likelihood that
there's performance implications.
- do you think a checkasm test makes sense? That would also make
performance measuring easier.

Code review:

On Fri, May 12, 2017 at 10:27 AM, James Darnley <jdarn...@obe.tv> wrote:

> +; Simple IDCT MMX
> +;
> +; Copyright (c) 2001, 2002 Michael Niedermayer <michae...@gmx.at>
>

Please add a line that you converted it from inline asm to x264asm syntax,
we've done that in other places also.


> +%macro DC_COND_IDCT 7
> +movq           mm0, [blockq + %1]             ; R4     R0      r4      r0
> +movq           mm1, [blockq + %2]             ; R6     R2      r6      r2
> +movq           mm2, [blockq + %3]             ; R3     R1      r3      r1
> +movq           mm3, [blockq + %4]             ; R7     R5      r7      r5
>

Please use 4-space indentation.


> +    %%9:
> +; :: "r" (block), "r" (temp), "r" (coeffs)
> +;    NAMED_CONSTRAINTS_ADD(wm1010,d40000)
> +; : "%eax"
> +%endmacro
>

The inline asm bits (middle 3 lines) can be removed (yay!).

Rest is fine. I am assuming that the binary size will grow slightly because
the idct is now inlined in the put/add (and duplicated between mmx/sse2),
but no performance implication (or possibly slight improvement because of
the inline). If that's correct, I'm OK with this.

Are you planning to write an actual SSE2 version of the IDCT? (No pressure,
just wondering.)

Ronald
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to