On Thu, Apr 27, 2017 at 3:51 PM, Nicolas George <geo...@nsup.org> wrote: > L'octidi 8 floréal, an CCXXV, Michael Niedermayer a écrit : >> I agree >> in fact i added such a flag in 2011 >> (4d34b6c1a1254850e39a36f08f4d2730092a54db) >> within the API of that time to avfilter > > It was not a bad idea, but it should not be limited to filters. A few > comments. > > * First, the framequeue framework does not produce unaligned code. > According to the C standard, the data it handles stay aligned. The > alignment problems come from non-standard requirements by special > processor features used by some filters and codecs, but not all. > > * That means a lot of the most useful codecs and filters will suffer > from it, but not all. For many tasks, the alignment is just fine, and > the extra copy would be wasteful. > > * The alignment requirements increase. Before AVX, it was up to 16, now > it can be 32, and I have no doubt future processor will at some point > require 64 or 128. But realigning buffers used with SSE to 32 would be > wasteful too. Thus, we do not require a flag but a full integer. > > * The code that does the actual work of realigning a buffer should > available as a stand-alone API, to be used by applications that work at > low-level. I suppose something like that would be in order: > > int av_frame_realign(AVFrame *frame, unsigned align); > > Or maybe: > > int av_frame_realign(AVFrame *frame, unsigned align, > AVBufferAllocator *alloc); > > where AVBufferAllocator is there to allocate possibly hardware or mmaped > buffers. > > * It is another argument for my leitmotiv that filters and codecs are > actually the same and should be merged API-wise. > > * It would be better to have the API just work for everything rather > than documenting the alignment needs. > > As for the actual implementation, I see a lot of different approaches: > > - have the framework realing the frame before submitting it to the > filters and codecs: costly in malloc() and memcpy() but simple; > > - have each filter or codec call av_frame_realign() as needed; it may > seem less elegant than the previous proposal, but it may prove a > better choice in the light of what follows; > > - have each filter or codec copy the unaligned data into a buffer > allocated once and for all or on the stack, possibly by small chunks: > less costly in malloc() and refcounting overhead, and possibly better > cache-locality, but more complex code; > > - run the plain C version of the code on unaligned data rather than the > vectorized version, or the less-vectorized version (SSE vs AVX) on > insufficiently aligned data. > > Since all this boils down to a matter of performance and is related to > the core task of FFmpeg, I think the choice between the various options > should be done on a case-by-case basis using real benchmarks.
Are you working on these? Because currently I'm not. Thank's _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel