Re: [FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

Lynne Mon, 12 Apr 2021 06:58:34 -0700

Apr 12, 2021, 10:07 by mar...@martin.st:

> Move the loop counter decrement further from the branch instruction,
> this hides the latency of the decrement.
>
> In loops that first load, then store (the horizontal prediction cases),
> do the decrement after the load (where the next instruction would
> stall a bit anyway, waiting for the result of the load).
>
> In loops that store twice using the same destination register,
> also do the decrement between the two stores (as the second store
> would need to wait for the updated destination register from the
> first instruction).
>
> In loops that store twice to two different destination registers,
> do the decrement before both stores, to do it as soon before the
> branch as possible.
>
> This gives minor (1-2 cycle) speedups in most cases (modulo measurement
> noise), but the horizontal prediction functions get a rather notable
> speedup on the Cortex A53.
>


LGTM
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

Reply via email to