Le lauantaina 2. maaliskuuta 2024, 14.06.13 EET flow gg a écrit :
> Here adjusting the order, rather than simply using .rept, will be 13%-24%
> faster.

Isn't it also faster to max LMUL for the adds here?

Also this might not be much noticeable on C908, but avoiding sequential 
dependencies on the address registers may help. I mean, avoid using as address 
operand a value that was calculated by the immediate previous instruction.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to