Samuel Thibault wrote: > Eugene V. Lyubimkin wrote: >> Utility html2text, version 1.3.2a-6, with "utf8" patch was just >> uploaded to experimental. The patch allows to process UTF-8 files >> when '-utf8' option supplied. Input should be in UTF-8 and output will >> be in UTF-8 too. >> >> Please test this functionality - I believe that UTF-8 support is a >> good feature, especially for processing non-English documents. > > Mmm, the way it is done looks wrong to me: there is no reason why the > input and output charsets should be related at all. For the input, > html2text should recognize the meta http-equiv tag, that should work > for a lot of pages, else an input-charset option can be provided. For > the output, the current locale's charset should be used (as returned by > nl_langinfo(CODESET) after calling setlocale(LC_CTYPE,"")), that should > work in almost all cases, else an output-charset option can be provided. > > Yes, that means conversions. But without that you can not put a sticker > "utf-8 support", only "limited utf-8 support". > > Samuel > Ok, this would be good. You are welcome to file minor/wishlist bug, and I will ask author to think on it. The author is not very active in html2text development, though.
-- Eugene V. Lyubimkin aka JackYF, Ukrainian C++ developer.
signature.asc
Description: OpenPGP digital signature