vc1: Arm 64-bit NEON inverse transform fast paths

Ben Avison Thu, 31 Mar 2022 08:37:26 -0700

On 30/03/2022 14:49, Martin Storsjö wrote:

Looks generally reasonable. Is it possible to factorize out theindividual transforms (so that you'd e.g. invoke the same macro twice inthe 8x8 and 4x4 functions) without too much loss?

There is a close analogy here with the vertical/horizontal deblockingfilters, because while there are similarities between the two matrixmultiplications within a transform, one of them follows a series ofloads and the other follows a matrix transposition.

If you look for example at ff_vc1_inv_trans_8x8_neon, you'll see I wasable to do a fair amount of overlap between sections of the function -particularly between the transpose and the second matrix multiplication,but to a lesser extent between the loads and the first matrixmultiplication and between the second multiplication and the stores.This sort of overlapping is tricky to maintain when using macros. Also,it means the the order of operations within each matrix multiply endedup quite different.

At first sight, you might think that the multiplies from the 8x8function (which you might also view as kind of 8-tap filter) would bere-usable for the size-8 multiplies in the 8x4 or 4x8 function. Yes, theinstructions are similar, save for using .4h elements rather than .8helements, but that has significant impacts on scheduling. For example,the Cortex-A72, which is my primary target, can only do NEON bit-shiftsin one pipeline at once, irrespective of whether the vectors are 64-bitor 128-bit long, while other instructions don't have such restrictions.

So while in theory you could factor some of this code out more, Isuspect any attempt to do so would have a detrimental effect on performance.


Ben
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

Reply via email to