Thank you for the detailed explanation. One more question: I understand that assembly code needs to be further broken down, but what's the issue with adding this code to the init section of the C code here? I think this C code is just mimicking the init section of the C code in x86.
Rémi Denis-Courmont <r...@remlab.net> 于2024年7月31日周三 23:06写道: > Le tiistaina 30. heinäkuuta 2024, 20.57.28 EEST flow gg a écrit : > > From my understanding, moving from supporting only 128b to adding 256b > > versions can simultaneously improve LMUL and solve some issues related to > > insufficient vector registers (vvc, vp9). > > To the contrary, if vectors are too short to process a macroblock in a > single > round, then there should be a loop with maximum LMUL, and the code should > be > the same for all vector length. That is just normal textbook RVV coding > style. > There should *not* be vector length specialisation since the code can be > shared. > > > If we continue to support 512, 1024, ..., it almost exclusively improves > > LMUL. > > I don't think so. Even more so than 256-bit hardware, 512-bit and 1024-bit > hardware really _needs_ to short-circuit vector processing based on VL and > not > simply follow LMUL. > > > Therefore, 256b is the most worthwhile addition, and we can skip > > adding 512b, 1024b, etc. > > > > Additionally, even though longer hardware will continually be developed, > > the most used will probably still be 128b and 256b. > > I wouldn't be so sure. Realistically, lower-end SoCs decode video with > DSPs. > So video decoder vector optimisations are mainly for the server side, and > that's exactly where larger vector sizes are most likely (e.g. AVX-512). > > > If someone complains that FFmpeg's RVV doesn't support 1024b well, it can > > be said that it's not just RISC-V that lacks good support. > > However, if the 256b performance is not good, then it seems like an issue > > with RISC-V. :) > > > > I think maybe we can give some preference to the two smallest lengths? > > As I wrote, I am not necessarily against specialising for 256-bit as such. > I > am against: > 1) specialising functions that do not really need to be specialised, > 2) adding tons of boilerplate (notably in the C code) for it. > > -- > 雷米‧德尼-库尔蒙 > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".