On 22.10.24 08:50, Diederick C. Niehorster wrote:
I want to pick up a discussion i started last week
(https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html)
in a new thread, with the relevant information nicely organized. This
is about adding pixel formats common in machine vision to ffmpeg
(though i understand some formats may also be used by cinema cameras),
and supporting them as input formats in swscale so that it becomes
easy to use ffmpeg for machine vision purposes (I already have such
software, it will be open-sourced in good time, but right now there is
a proprietary conversion layer from Basler i need to replace (e.g. by
this proposal)).

most of your point do not look so much machine learning or computer vision specific, but more like typical/traditional video tech peculiarities. More ML related obstacles come into play, if have to support optimized calculations with uncommon small bit sizes, etc. But most of your described issues should be solvable easily by already available features of ffmpeg, if I'm not wrong.

Example formats are 10 and 12 bit Bayer formats, where the 10 bit
cannot be represented in AVPixFmtDescriptors as currently as effective
bit depth for the red and blue channels is 2.5 bits, but component
depths should be integers.

As bits will always be distinct entities, you don't need more than simple natural numbers to describe their placement and amount precisely.

ffmpeg already supports the AV_PIX_FMT_FLAG_BITSTREAM to switch some description fields from byte to bit values. That's enough to describe the layout of most pixelformats -- even those packed ones, which are not aligned to byte or 32bit borders. You just have to use bit size values for step and offset stuct members.

But there is another common case, which is indeed not describable with ffmpeg current stuct: color components can be composed out of separated MSb and LSb parts at different places in the component sequenz -- similar to the color examples BayerRG12g40 and BayerRG12g24 in your linked examples. Although these examples are indeed a little bit more complex, because they may describe arrangements, which differ between even and odd lanes. The bit packing for 10 and 12bit data in DNxUncompressed entails a similar issue, by packing all LSb information as one block at the end of every scan line.

For the simple case of just separated MSb and LSb locations within otherwise simply repeating pixel bits group it could be solved by extending the description in a similar way as used in the RGBALayout description sequenz of MXF -- see G.2.40/p174 of https://pub.smpte.org/latest/st377-1/st377-1-2019.pdf

More complex arrangements should be IMHO simply converted by application specfic handling to more common formats, but don't get an overly complex ffmpeg pixel description.

Other example formats are 10bit gray
formats where multiple values are packed without padding over multiple
bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16
or 32 bits).

That's no problem, as already explained.

The unpacking of this kind of date to more sparse 16 bit aligned structures can be handled very efficient by using PDEP intrinsics of modern CPUs, as long as the order of components fits. Component order swapping is unfortunately a slightly more inefficient operation in case of packed image date, while it can be solved much more easily in case of planar data arrangements by pointer swaps.

Here a proposal for how these new formats could be encoded into
AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale.


I think swscale and the internal processing of ffmpeg should not be support an endless amount of arbitrary pixel formats, but be focused on a really useful minimal set of required base formats.

I would look at vulkans pixel format list as modern example for more systematic list of elementary pixel data storage variants.
(https://docs.vulkan.org/spec/latest/chapters/formats.html)

- AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are
bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
4 10-bit values in 5 bytes). This flag is needed because
AV_PIX_FMT_FLAG_BITSTREAM
formats are aligned to 8 or 32 bits, ...

Is this really the case?

But in generals you should better describe byte/32bit aligned bitbacked formats by using explicit "fill" (X, etc.) pseudo components, than you can simply indicate aligned and unaligned groups by the actual sum of defined bits res. the reminder of a division by the alignment bit size count.

I hope, that's at least inspiring food for thought... ;)

Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to