L'octidi 8 floréal, an CCXXV, Michael Niedermayer a écrit : > I agree > in fact i added such a flag in 2011 (4d34b6c1a1254850e39a36f08f4d2730092a54db) > within the API of that time to avfilter
It was not a bad idea, but it should not be limited to filters. A few comments. * First, the framequeue framework does not produce unaligned code. According to the C standard, the data it handles stay aligned. The alignment problems come from non-standard requirements by special processor features used by some filters and codecs, but not all. * That means a lot of the most useful codecs and filters will suffer from it, but not all. For many tasks, the alignment is just fine, and the extra copy would be wasteful. * The alignment requirements increase. Before AVX, it was up to 16, now it can be 32, and I have no doubt future processor will at some point require 64 or 128. But realigning buffers used with SSE to 32 would be wasteful too. Thus, we do not require a flag but a full integer. * The code that does the actual work of realigning a buffer should available as a stand-alone API, to be used by applications that work at low-level. I suppose something like that would be in order: int av_frame_realign(AVFrame *frame, unsigned align); Or maybe: int av_frame_realign(AVFrame *frame, unsigned align, AVBufferAllocator *alloc); where AVBufferAllocator is there to allocate possibly hardware or mmaped buffers. * It is another argument for my leitmotiv that filters and codecs are actually the same and should be merged API-wise. * It would be better to have the API just work for everything rather than documenting the alignment needs. As for the actual implementation, I see a lot of different approaches: - have the framework realing the frame before submitting it to the filters and codecs: costly in malloc() and memcpy() but simple; - have each filter or codec call av_frame_realign() as needed; it may seem less elegant than the previous proposal, but it may prove a better choice in the light of what follows; - have each filter or codec copy the unaligned data into a buffer allocated once and for all or on the stack, possibly by small chunks: less costly in malloc() and refcounting overhead, and possibly better cache-locality, but more complex code; - run the plain C version of the code on unaligned data rather than the vectorized version, or the less-vectorized version (SSE vs AVX) on insufficiently aligned data. Since all this boils down to a matter of performance and is related to the core task of FFmpeg, I think the choice between the various options should be done on a case-by-case basis using real benchmarks. Regards, -- Nicolas George
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel