Apr 12, 2021, 10:07 by mar...@martin.st: > Move the loop counter decrement further from the branch instruction, > this hides the latency of the decrement. > > In loops that first load, then store (the horizontal prediction cases), > do the decrement after the load (where the next instruction would > stall a bit anyway, waiting for the result of the load). > > In loops that store twice using the same destination register, > also do the decrement between the two stores (as the second store > would need to wait for the updated destination register from the > first instruction). > > In loops that store twice to two different destination registers, > do the decrement before both stores, to do it as soon before the > branch as possible. > > This gives minor (1-2 cycle) speedups in most cases (modulo measurement > noise), but the horizontal prediction functions get a rather notable > speedup on the Cortex A53. >
LGTM _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".