Le tiistaina 20. toukokuuta 2025, 10.58.06 Itä-Euroopan kesäaika 
daichengr...@iscas.ac.cn a écrit :
> From: daichengrong <daichengr...@iscas.ac.cn>
> 
> Since there are no comments for v2 and v3, we have continued to optimize
> according to the comments of v1. We spilled the slide to memory to help
> improve performance,and optimized the extraction of elements from vector
> registers.

You still seem to be flip-flopping values in X registers. You may need to go 
easier on macros to get a better view of the actual generated code.

Also it seems that this uses half-vectors a lot. I am not sure if this can be 
avoided, but typically that leads to very poor performance.

Also you're resetting `vl` with its current value, which can hurt performance 
depending on the implementation. If you don't need to change `vl`, then use 
`zero`.

Lastly, you seem to be changing vtype when it's not actually needed, e.g.:

vsetvli zero, 4, e16, mf2...
...
vsetvli zero, 4, e32, mf1...
vse32.v ...




-- 
德尼-库尔蒙‧雷米
Hagalund ny stad, f.d. Finska republik Nylands



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to