Hi, I've been working in a package (QA) that show diff between documents. And a selling point is working with different encodings (see the long description bellow).
DocDiff compares two files and shows the difference. It can compare files word by word, char by char, or line by line. . It has several output formats such as HTML/XHTML, tty, Manued, or user-defined markup. It supports several encodings and end-of-line characters, including ASCII, UTF-8, EUC-JP, Shift_JIS, CR, LF, and CRLF. The upstream provides some examples which are pairs of files with small changes between them. This leads to my question. One of these pairs use local japanese encoding which makes the lintian scream: W: docdiff: national-encoding usr/share/doc/docdiff/examples/01.ja.eucjp.lf W: docdiff: national-encoding usr/share/doc/docdiff/examples/02.ja.eucjp.lf The recommendation from this warning is to convert the file to UTF-8 but there is already UTF-8 examples in the directory. And upon closer inspection, some files use other encodings to (see bellow). 01.en.ascii.cr: ASCII text, with CR line terminators 01.en.ascii.crlf: ASCII text, with CRLF line terminators 01.en.ascii.lf: ASCII text 01.ja.eucjp.lf: ISO-8859 text 01.ja.sjis.cr: Non-ISO extended-ASCII text, with CR line terminators 01.ja.sjis.crlf: Non-ISO extended-ASCII text, with CRLF line terminators 01.ja.utf8.crlf: UTF-8 Unicode text, with CRLF line terminators 02.en.ascii.cr: ASCII text, with CR line terminators 02.en.ascii.crlf: ASCII text, with CRLF line terminators 02.en.ascii.lf: ASCII text 02.ja.eucjp.lf: ISO-8859 text 02.ja.sjis.cr: Non-ISO extended-ASCII text, with CR line terminators 02.ja.sjis.crlf: Non-ISO extended-ASCII text, with CRLF line terminators 02.ja.utf8.crlf: UTF-8 Unicode text, with CRLF line terminators humpty_dumpty01.ascii.lf: ASCII text humpty_dumpty02.ascii.lf: ASCII text My question is what is the best course of action in this situation? I see 3 possible ways for dealing with this. 1. Not installing these files indicated by the lintian (what about the others not indicated like *ja.sjis.cr?). 2. Installing the files in spite of lintian warning. 3. Convert them to UTF-8 even with UTF-8 files already provided. I tend to the first option but I'm not sure so I would really like to hear your opinion. Also I'd like to know what is the status of these local encodings in Debian, is there any place still using it? Sorry for the long email and thanks, Charles