* Adam Borowski <kilob...@angband.pl>, 2013-08-12, 02:51:
Detecting non-UTF files is easy:
* false positives are impossible
* false negatives are extremely unlikely: combinations of letters that would
happen to match a valid utf character don't happen naturally, and even if they
did, every single combination in the file tested would need to match valid
utf.
Not+IAo-quite. While 7-bit encodings different than ASCII are all endangered
species, some of them can still be seen in the wild, and they excellently
disguise themselves as UTF-8. (We had to add special code to detect ISO-2022
encodings to Lintian not that long ago.)
Anyway, it you want to help UTF-8-ize the world, you could start by providing
patches for these bugs:
http://lintian.debian.org/tags/debian-changelog-file-uses-obsolete-national-encoding.html
http://lintian.debian.org/tags/debian-copyright-file-uses-obsolete-national-encoding.html
http://lintian.debian.org/tags/doc-base-file-uses-obsolete-national-encoding.html
--
Jakub Wilk
--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130917161026.ga6...@jwilk.net