Re: [FFmpeg-devel] RFC: new packed pixel formats (machine vision)

martin schitter Tue, 22 Oct 2024 02:41:40 -0700



On 22.10.24 08:50, Diederick C. Niehorster wrote:

I want to pick up a discussion i started last week
(https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html)
in a new thread, with the relevant information nicely organized. This
is about adding pixel formats common in machine vision to ffmpeg
(though i understand some formats may also be used by cinema cameras),
and supporting them as input formats in swscale so that it becomes
easy to use ffmpeg for machine vision purposes (I already have such
software, it will be open-sourced in good time, but right now there is
a proprietary conversion layer from Basler i need to replace (e.g. by
this proposal)).

most of your point do not look so much machine learning or computervision specific, but more like typical/traditional video techpeculiarities. More ML related obstacles come into play, if have tosupport optimized calculations with uncommon small bit sizes, etc. Butmost of your described issues should be solvable easily by alreadyavailable features of ffmpeg, if I'm not wrong.

Example formats are 10 and 12 bit Bayer formats, where the 10 bit
cannot be represented in AVPixFmtDescriptors as currently as effective
bit depth for the red and blue channels is 2.5 bits, but component

depths should be integers.

As bits will always be distinct entities, you don't need more thansimple natural numbers to describe their placement and amount precisely.

ffmpeg already supports the AV_PIX_FMT_FLAG_BITSTREAM to switch somedescription fields from byte to bit values. That's enough to describethe layout of most pixelformats -- even those packed ones, which are notaligned to byte or 32bit borders. You just have to use bit size valuesfor step and offset stuct members.

But there is another common case, which is indeed not describable withffmpeg current stuct: color components can be composed out of separatedMSb and LSb parts at different places in the component sequenz --similar to the color examples BayerRG12g40 and BayerRG12g24 in yourlinked examples. Although these examples are indeed a little bit morecomplex, because they may describe arrangements, which differ betweeneven and odd lanes. The bit packing for 10 and 12bit data inDNxUncompressed entails a similar issue, by packing all LSb informationas one block at the end of every scan line.

For the simple case of just separated MSb and LSb locations withinotherwise simply repeating pixel bits group it could be solved byextending the description in a similar way as used in the RGBALayoutdescription sequenz of MXF -- see G.2.40/p174 ofhttps://pub.smpte.org/latest/st377-1/st377-1-2019.pdf

More complex arrangements should be IMHO simply converted by applicationspecfic handling to more common formats, but don't get an overly complexffmpeg pixel description.

Other example formats are 10bit gray
formats where multiple values are packed without padding over multiple
bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16
or 32 bits).


That's no problem, as already explained.

The unpacking of this kind of date to more sparse 16 bit alignedstructures can be handled very efficient by using PDEP intrinsics ofmodern CPUs, as long as the order of components fits. Component orderswapping is unfortunately a slightly more inefficient operation in caseof packed image date, while it can be solved much more easily in case ofplanar data arrangements by pointer swaps.

Here a proposal for how these new formats could be encoded into
AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale.

I think swscale and the internal processing of ffmpeg should not besupport an endless amount of arbitrary pixel formats, but be focused ona really useful minimal set of required base formats.

I would look at vulkans pixel format list as modern example for moresystematic list of elementary pixel data storage variants.

(https://docs.vulkan.org/spec/latest/chapters/formats.html)

- AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are
bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
4 10-bit values in 5 bytes). This flag is needed because
AV_PIX_FMT_FLAG_BITSTREAM
formats are aligned to 8 or 32 bits, ...


Is this really the case?

But in generals you should better describe byte/32bit aligned bitbackedformats by using explicit "fill" (X, etc.) pseudo components, than youcan simply indicate aligned and unaligned groups by the actual sum ofdefined bits res. the reminder of a division by the alignment bit sizecount.


I hope, that's at least inspiring food for thought... ;)

Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] RFC: new packed pixel formats (machine vision)

Reply via email to