Le tridi 3 vendémiaire, an CCXXIV, James Darnley a écrit : > As far as I understand the iconv API, it doesn't appear to do this for > you. So adding this feature would require writing code to handle more > errors returned from the iconv() function. That means a more > complicated argument handling structure is needed. > > I don't mind trying to write this but it would be better to do it behind > the API you propose.
Of course. Actually, it is already there in the API, although I am not quite satisfied because it can not be set as an option. > I will help you with it as best I can because I > seem to have involuntarily volunteered myself. I need some feedback to know if this kind of API is useful in FFmpeg (other people are welcome to give advice too!), and to know if the actual API I propose is suitable for various needs. But as for writing the code, I expect it to be quite straightforward. The question where I most need feedback is this: shall I make an API that allows to convert from any encoding to any encoding, or an API that can convert from any encoding to UTF-8 and from UTF-8 to any encoding? There are pros and cons for each case. UTF-8 to/from anything is enough for the needs of any sane program, and makes the handling of the replacement character easier (because it can be specified in UTF-8 directly). OTOH, any-to-any is more generic. > I don't know what to say here. I know the encodings needed for iconv > because I arrived at them by brute force. I wrote a short Lua script to > iterate over a list of encodings supported by my iconv and arrived at > this answer. The command line tool called iconv is too clever for this > because it returns an error when it can't convert. As for ending in > GBK, it is what the script told me. Could you share the script and enough input to run it and reproduce the results? > This feature would not work if there was a misinterpretation in the > middle. As you say that would need A->B and C->D where B != C. Perhaps > this is why my solution isn't perfect, because there should be an > assumption in the middle. > > I could rework my code to allow for assumptions in the middle. My case > would then use "CP1252,UTF-8,UTF-8,GBK" as an argument. I must say, I do not like your approach very much because it manipulates text encoding in the middle of the program. All strings inside the program should be in UTF-8. I can propose this: add an option "metadata_text_encoding" to AVFormatContext. If it is set on a demuxer, the demuxing framework uses it to convert from it to UTF-8; and similarly, if it is set on a muxer, the muxing framework uses it to convert from UTF-8 to it. Then we can have a special syntax for it to specify bogus conversions. Possibly: -metadata_text_encoding "[CP1252>UTF-8]GBK" to specify that the text must first be converted from CP1252 to UTF-8 then considered to be GBK (and converted to UTF-8). (Well, I consider the feature evil, so I will probably not volunteer to implement it, but I will not oppose as long as it can not be triggered too easily by an unsuspecting user. What do you think of it? Regards, -- Nicolas George
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel