Le 9 mars 2025 12:57:47 GMT-07:00, Niklas Haas <ffm...@haasn.xyz> a écrit :
>On Sun, 09 Mar 2025 11:18:04 -0700 Rémi Denis-Courmont <r...@remlab.net> wrote:
>> Hi,
>> 
>> Le 8 mars 2025 14:53:42 GMT-08:00, Niklas Haas <ffm...@haasn.xyz> a écrit :
>> >https://github.com/haasn/FFmpeg/blob/swscale3/doc/swscale-v2.txt
>> 
>> >I have spent the past week or so ironing 
>> >I wanted to post it here to gather some feedback on the approach. Where does
>> >it fall on the "madness" scale? Is the new operations and optimizer design
>> >comprehensible? Am I trying too hard to reinvent compilers? Are there any
>> >platforms where the high number of function calls per frame would be
>> >probitively expensive? What are the thoughts on the float-first approach? 
>> >See
>> >also the list of limitations and improvement ideas at the bottom of my 
>> >design
>> >document.
>> 
>> Using floats internally may be fine if there's (almost) never any spillage, 
>> but that necessarily implies custom calling conventions. And won't work with 
>> as many as 32 pixels. On RVV 128-bit, you'd have only 4 vectors. On Arm 
>> NEON, it would be even worse as scalars/constants need to be stored in 
>> vectors as well.
>
>I think that a custom calling convention is not as unreasonable as it may 
>sound,
>and will actually be easier to implement than the standard calling convention
>since functions will not have to deal with pixel load/store, nor will there be
>any need for "fused" versions of operations (whose only purpose is to avoid
>the roundtrip through L1).
>
>The pixel chunk size is easily changed; it is a compile time constant and there
>are no strict requirements on it. If RISC-V (or any other platform) struggles
>with storing 32 floats in vector registers, we could go down to 16 (or even 8);
>the number 32 was merely chosen by benchmarking and not through any careful
>design consideration.

It can't be a compile time constant on RVV nor (if it's ever introduced) SVE 
because they are scalable. I doubt that a compile-time constant will work well 
across all variants of x86 as well, but not that I'd know.

>Do you have access to anything with decent RVV F32 support that we could use
>for testing? It's my understanding that existing RVV implementations have been
>rather primitive.

Float is quite okay on RVV. It is faster than integers on some lavc audio loops 
already.

That said, I only have access to TH-C908 (128-bit) and  ST-X60 (256-bit), as 
before, and I haven't been contacted to get access anything better. The X60 is 
used on FATE.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to