On Mon, Jun 19, 2017 at 5:11 PM, James Darnley <jdarn...@obe.tv> wrote:
> +    por     m1, m8, m13
> +    por     m1, m12
> +    por     m1, [blockq+ 16]       ; { row[1] }[0-7]
> +    por     m1, [blockq+ 48]       ; { row[3] }[0-7]
> +    por     m1, [blockq+ 80]       ; { row[5] }[0-7]
> +    por     m1, [blockq+112]       ; { row[7] }[0-7]

Using a single register as destination here means that only one
instruction per cycle can be executed due to dependencies. Splitting
it across two destinations would double the (local) IPC.

OoOE might alleviate it, but no reason to unnecessarily rely on it.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to