Apr 12, 2021, 10:07 by mar...@martin.st:

> Move the loop counter decrement further from the branch instruction,
> this hides the latency of the decrement.
>
> In loops that first load, then store (the horizontal prediction cases),
> do the decrement after the load (where the next instruction would
> stall a bit anyway, waiting for the result of the load).
>
> In loops that store twice using the same destination register,
> also do the decrement between the two stores (as the second store
> would need to wait for the updated destination register from the
> first instruction).
>
> In loops that store twice to two different destination registers,
> do the decrement before both stores, to do it as soon before the
> branch as possible.
>
> This gives minor (1-2 cycle) speedups in most cases (modulo measurement
> noise), but the horizontal prediction functions get a rather notable
> speedup on the Cortex A53.
>

LGTM
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to