Le 9 mars 2025 12:57:47 GMT-07:00, Niklas Haas <ffm...@haasn.xyz> a écrit : >On Sun, 09 Mar 2025 11:18:04 -0700 Rémi Denis-Courmont <r...@remlab.net> wrote: >> Hi, >> >> Le 8 mars 2025 14:53:42 GMT-08:00, Niklas Haas <ffm...@haasn.xyz> a écrit : >> >https://github.com/haasn/FFmpeg/blob/swscale3/doc/swscale-v2.txt >> >> >I have spent the past week or so ironing >> >I wanted to post it here to gather some feedback on the approach. Where does >> >it fall on the "madness" scale? Is the new operations and optimizer design >> >comprehensible? Am I trying too hard to reinvent compilers? Are there any >> >platforms where the high number of function calls per frame would be >> >probitively expensive? What are the thoughts on the float-first approach? >> >See >> >also the list of limitations and improvement ideas at the bottom of my >> >design >> >document. >> >> Using floats internally may be fine if there's (almost) never any spillage, >> but that necessarily implies custom calling conventions. And won't work with >> as many as 32 pixels. On RVV 128-bit, you'd have only 4 vectors. On Arm >> NEON, it would be even worse as scalars/constants need to be stored in >> vectors as well. > >I think that a custom calling convention is not as unreasonable as it may >sound, >and will actually be easier to implement than the standard calling convention >since functions will not have to deal with pixel load/store, nor will there be >any need for "fused" versions of operations (whose only purpose is to avoid >the roundtrip through L1). > >The pixel chunk size is easily changed; it is a compile time constant and there >are no strict requirements on it. If RISC-V (or any other platform) struggles >with storing 32 floats in vector registers, we could go down to 16 (or even 8); >the number 32 was merely chosen by benchmarking and not through any careful >design consideration.
It can't be a compile time constant on RVV nor (if it's ever introduced) SVE because they are scalable. I doubt that a compile-time constant will work well across all variants of x86 as well, but not that I'd know. >Do you have access to anything with decent RVV F32 support that we could use >for testing? It's my understanding that existing RVV implementations have been >rather primitive. Float is quite okay on RVV. It is faster than integers on some lavc audio loops already. That said, I only have access to TH-C908 (128-bit) and ST-X60 (256-bit), as before, and I haven't been contacted to get access anything better. The X60 is used on FATE. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".