Le 8 mars 2024 02:45:46 GMT+02:00, flow gg <hlefthl...@gmail.com> a écrit :
>> Isn't it also faster to max LMUL for the adds here?
>
>It requires the use of one more vset, making the time slightly longer:
>147.7 (m1), 148.7 (m8 + vset).

A variation of 0.6% on a single set of kernels will end up below measurement 
noise in real overall codec usage. And then reducing the I-cache contention can 
improve performance in other ways. Larger LMUL should also improve performance 
on bigger cores with more ALUs. So it's not all black and white.

My personal preference is to keep the code small if it makes almost no 
difference but I'm not BDFL.

>Also this might not be much noticeable on C908, but avoiding sequential
>dependencies on the address registers may help. I mean, avoid using as
>address
>operand a value that was calculated by the immediate previous instruction.
>
>> Okay, but the test results haven't changed..
>It would add more than ten lines of code, perhaps shorter code will better?

I don't know. There are definitely in-order vector cores coming, and data 
dependencies will hurt them. But I don't know if anyone will care about FFmpeg 
on those.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to