Le duodi 22 frimaire, an CCXXIII, Rodger Combs a écrit : > This also moves general charenc conversion from avcodec to avformat; > the version in avcodec is left, but renamed; I'm not sure if that's > the optimal solution. > > The documentation could probably use some improvements, and a few more > options could be added to ENCA. > > This very simply prefers libguess over ENCA, and ENCA over uchardet, but > will fall back on a less-preferred guess if something decodes wrong, and will > drop illegal sequences in iconv if all else fails. > > It'd be possible to have ffmpeg.c present a UI if multiple guesses are > returned, and other library consumers could do the same.
So, now that I have a decent connection and time, here are some comments: First, your patch seems to happen after the text demuxers have parsed the text files. Therefore, this can not work for non-ASCII-compatible encodings, such as UTF-16. You might say that UTF-16 already works, but its implementation is bogus and leads to user-visible problems (see trac ticket #4059). But even if it was not, we would not want two competing detection layers. More importantly: the lavc API is ready to handle situations where the recoding has been done by the demuxer. See the doxy for sub_charenc_mode and the associated constants. So if you are discarding it or adding competing fields, you are certainly missing something on the proper use of the API. Of course, if you think the API is not actually ready, feel free to explain and discuss your point. Third point: detection is not something that works well, and people will frequently find versions of FFmpeg built without their favourite library. For both these reasons, applications using the library should be able to provide their own detection mechanism to complement or even replace the ones hardcoded in FFmpeg. Same goes for conversion, even if it is not as important. Fourth and last point: detecting text encoding is not useful only for text subtitles formats, other features may need it: filter graph files (think of drawtext options), ffmetadata files, etc. Here is the API I am considering. I had started to implement it until bickering and lack of enthusiasm discouraged me. The work happens in lavu, and is therefore available everywhere, replacing av_file_map() whenever it is used for text files. It is an API for reading text files / buffers / streams, taking care of all the gory details. Text encoding, of course, but also the LF / CRLF mess, possibly splitting lines at the same time, maybe normalizing spaces, etc. The text-file-read API is controlled with a context parameter, holding amongst other things a list of "detection modules", and also "recoding modules". Detection modules are just a structure with a callback. FFmpeg provides built-in modules, such as your proposed libguess, libenca and libuchardet code, but applications can also create their own modules. Then it is just a matter of changing the subtitle-specific FFTextReader API to use the new lavu text-file-read API. I hope this helps. Regards, -- Nicolas George
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel