Hi All, I want to pick up a discussion i started last week (https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html) in a new thread, with the relevant information nicely organized. This is about adding pixel formats common in machine vision to ffmpeg (though i understand some formats may also be used by cinema cameras), and supporting them as input formats in swscale so that it becomes easy to use ffmpeg for machine vision purposes (I already have such software, it will be open-sourced in good time, but right now there is a proprietary conversion layer from Basler i need to replace (e.g. by this proposal)).
Example formats are 10 and 12 bit Bayer formats, where the 10 bit cannot be represented in AVPixFmtDescriptors as currently as effective bit depth for the red and blue channels is 2.5 bits, but component depths should be integers. Other example formats are 10bit gray formats where multiple values are packed without padding over multiple bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16 or 32 bits). See https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-monochrome-pixel-formats.html for a diagram of the Mono10p and https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-raw-bayer-pixel-formats.html for diagrams of the packed and not packed bayer formats. Here a proposal for how these new formats could be encoded into AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale. I have taken care that none of the existing pixel formats or any code dealing with them would be affected, although new code would be needed to handle these new formats (av_read_image_line2, av_write_image_line2 and functions printing info about AVPixFmtDescriptors, plus swscale of course--i commit to do a full audit to ensure nothing else is missed). First, two new flags are needed (usages are shown below in the example new pixel formats). I propose: - AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL which indicates that the value in the component depths (ints) represent a 16 bit numerator and denominator packed into the int. That should be able to store any value that could ever be possible and importantly allows for the fractional bit depths needed for the bayer formats. - AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g. 4 10-bit values in 5 bytes). This flag is needed because AV_PIX_FMT_FLAG_BITSTREAM formats are aligned to 8 or 32 bits, and this kind of unaligned packing needs special handling ( see below). Using these flags, here are some example new pixel formats: [AV_PIX_FMT_BAYER_RGGB10] = { .name = "bayer_rggb10", .nb_components = 3, .log2_chroma_w = 0, .log2_chroma_h = 0, .comp = { { 0, 2, 0, 0, 655364 }, /* 2.5: 10/4 (10<<16 + 4) */ { 0, 2, 0, 0, 655362 }, /* 5: 10/2 */ { 0, 2, 0, 0, 655364 }, /* 2.5: 10/4 */ }, .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER | AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL, }, [AV_PIX_FMT_BAYER_RGGB12] = { .name = "bayer_rggb12", .nb_components = 3, .log2_chroma_w = 0, .log2_chroma_h = 0, .comp = { { 0, 2, 0, 0, 3 }, { 0, 2, 0, 0, 6 }, { 0, 2, 0, 0, 3 }, }, .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER, }, [AV_PIX_FMT_BAYER_GRAY10P] = { .name = "gray10p", .nb_components = 1, .log2_chroma_w = 0, .log2_chroma_h = 0, .comp = { { 0, 2, 0, 0, 10 }, /* Y */ }, .flags = AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED, }, [AV_PIX_FMT_BAYER_RGGB10P] = { .name = "bayer_rggb10p", .nb_components = 3, .log2_chroma_w = 0, .log2_chroma_h = 0, .comp = { { 0, 2, 0, 0, 655364 }, /* 2.5: 10/4 (10<<16 + 2) */ { 0, 2, 0, 0, 655362 }, /* 5: 10/2 */ { 0, 2, 0, 0, 655364 }, /* 2.5: 10/4 */ }, .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER | AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL | AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED, }, [AV_PIX_FMT_BAYER_RGGB12P] = { .name = "bayer_rggb12p", .nb_components = 3, .log2_chroma_w = 0, .log2_chroma_h = 0, .comp = { { 0, 2, 0, 0, 3 }, { 0, 2, 0, 0, 6 }, { 0, 2, 0, 0, 3 }, }, .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER | AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED, }, When a AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED is encountered, one needs to find out how many bytes are used to store how many samples (with a "sample" I refer to one color channel value or a gray scale value). This information can be distilled from the AVPixFmtDescriptor as follows: gray10p: sum(component_bit_depths)=10: least common multiple of 10 and 8 is 40, so there are 40/10=4 samples packed in to 40/8=5 bytes. bayer_rggb10p: sum(component_bit_depths)=10: least common multiple of 10 and 8 is 40, so there are 40/10=4 samples packed in to 40/8=5 bytes. bayer_rggb12p: sum(component_bit_depths)=12: least common multiple of 12 and 8 is 24, so there are 24/12=2 samples packed in to 24/8=3 bytes. Presence of the AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED flag indicates that such computations are needed and leaves it flexible how many samples are packed into how many bytes. I have not thought about whether this would also allow turning v210 (v210enc/dec, AV_CODEC_ID_V210 ) into a pixel format and deprecating the encoder/decoder (presumably its a good thing to remove this special handling), or whether this scheme then runs into a limitation. bitpacked_enc (AV_CODEC_ID_BITPACKED) should also be examined. I leave examining this for a later stage after comments on the above proposal. Looking forward to hearing what you/the list think! All the best, Dee _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".