Re: [FFmpeg-devel] [RFC] Type descriptors

Jim DeLaHunt Thu, 31 Dec 2020 12:16:04 -0800

On Thu Dec 31 15:35:38 EET 2020, Nicolas George <geo...@nsup.org> wrote:

…For each simple type, including enumerations like AVColorRange and flat
structures like AVReplayGain, have a set of standardized functions for
common operations, including probably:

- printing;
- serializing to string;
- parsing from string;
…
These functions will have a standardized name and prototype. They will be
grouped in structures that describe a type entirely.

Note: this project requires a good unified string API.

This relates to one of FFmpeg's imperfections: it writes human-readabletext to stdout and stderr in an unpredictable and inconsistent encoding.It should be 100% consistently encoded. I suggest it should be Unicodein UTF-8 code form.

One of the places where FFmpeg's inconsistent encoding caused me aproblem was when I was operating on a Quicktime video. FFmpeg (orperhaps FFprobe) printed a 4-byte Quicktime tag literally to stdout. Thetag's byte sequence was not valid UTF-8. It messed up the output. Thattag, being arbitrary binary data, should have been escaped or printed inhex or otherwise represented in valid UTF-8.

I suggest that the type descriptor[1] and Unified string / stream API[2]proposals offer a good opportunity to define two separate data types:string of text, and stream of bytes. Define encode functions totransform text into bytes, and decode functions to transform bytes intotext. The Python language str, bytes, and codecs architecture[3] is apretty good model.

I suggest that FFmpeg define that strings of text always be stored asUTF-8 code units. An argument could be made for defining strings of textas being in any encoding, as long as every single string instance beclearly labelled with its text encoding. (Specifying that all text is inUTF-8 achieves clear labelling with no code.) I suggest requiring thatonly validly-encoded data shall be permitted in text strings.

FFmpeg code often operates on byte-granularity binary data. These shouldbe defined as data types which are different than "string", because theyare not text.

FFmpeg generates human-readable output to stdout, to stderr, and tologs. I suggest that all this output be required to be text strings,preferably always in UTF-8. Any arbitrary binary data written tohuman-readable output must be encoded or escaped somehow, so that it isrepresented as valid text.


[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2020-December/274170.html
[2] https://ffmpeg.org/pipermail/ffmpeg-devel/2020-December/274169.html
[3] https://docs.python.org/3/howto/unicode.html

This is an ambitious project. Good luck with it!
       --Jim DeLaHunt, Vancouver, Canada


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [RFC] Type descriptors

Reply via email to