Le tiistaina 20. toukokuuta 2025, 10.58.06 Itä-Euroopan kesäaika daichengr...@iscas.ac.cn a écrit : > From: daichengrong <daichengr...@iscas.ac.cn> > > Since there are no comments for v2 and v3, we have continued to optimize > according to the comments of v1. We spilled the slide to memory to help > improve performance,and optimized the extraction of elements from vector > registers.
You still seem to be flip-flopping values in X registers. You may need to go easier on macros to get a better view of the actual generated code. Also it seems that this uses half-vectors a lot. I am not sure if this can be avoided, but typically that leads to very poor performance. Also you're resetting `vl` with its current value, which can hurt performance depending on the implementation. If you don't need to change `vl`, then use `zero`. Lastly, you seem to be changing vtype when it's not actually needed, e.g.: vsetvli zero, 4, e16, mf2... ... vsetvli zero, 4, e32, mf1... vse32.v ... -- 德尼-库尔蒙‧雷米 Hagalund ny stad, f.d. Finska republik Nylands _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".