Le lauantaina 2. maaliskuuta 2024, 14.06.13 EET flow gg a écrit : > Here adjusting the order, rather than simply using .rept, will be 13%-24% > faster.
Isn't it also faster to max LMUL for the adds here? Also this might not be much noticeable on C908, but avoiding sequential dependencies on the address registers may help. I mean, avoid using as address operand a value that was calculated by the immediate previous instruction. -- Rémi Denis-Courmont http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".