Package: debian-policy Version: 4.0.0.2 Severity: minor Tags: patch Justification: garbled display (mojibake) in web browsers
Dear debian-policy Maintainers, There are numerous non-breaking space characters (U+00A0) in: https://www.debian.org/doc/packaging-manuals/upgrading-checklist.txt These are encoded as UTF-8, which is 0xC2 0xA0, or octal \302 \240. The file does not begin with the UTF-8 signature though, so web browsers might not properly display the UTF-8 characters. This is true of the version of Firefox that is in the current Stretch stable release. Those non-breaking space characters actually occur within a very short title line, so they could be changed to plain old spaces with no side-effects. That might not be the only UTF-8 that appears in such files someday though, so a more general solution would be to start the file with the UTF-8 signature, aka the Byte Order Mark (BOM). This is the UTF-8 encoding of U+FEFF, which is 0xEF 0xBB 0xBF or octal \357 \273 \277. Then a web browser should display UTF-8 characters within the text file properly. This sed command will prepend this symbol to a file, modifying the file in place: sed -i '1s/^/\o357\o273\o277/' upgrading-checklist.txt Alternatively, this awk script would insert the same three-byte sequence but will not edit the file in place: awk 'BEGIN {printf ("\357\273\277");} {print;}' < original.txt > utf8-version.txt Thanks, Paul Hardy