On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote:
> +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
[...]
> +    movd               xm0, [blockq]
> +    mova                m1, [pw_11585x2]
> +    pmulhrsw            m0, m1
> +    pmulhrsw            m0, m1
> +    vpbroadcastw        m0, xm0
> +    pmulhrsw            m0, [pw_512]

The vpbroadcastw could be done from memory in the beginning which
would get rid of the movd.

Is it mathematically possible to merge consecutive pmulhrsw
instructions into a single one using a different constant? I'm
guessing no, but I'm not sure.

[...]

> +    ; at the end of the loop, m7 should still be zero
> +    ; use that to zero out block coefficients
> +    ZERO_BLOCK      blockq, 64, 16, m1

comment says m7, code says m1.

[...]

> +    ; at the end of the loop, m7 should still be zero
> +    ; use that to zero out block coefficients
> +    ZERO_BLOCK      blockq, 64, 32, m1

Ditto.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to