> I have observed that groff changes certain characters in manpages. > [...] > > For example, in perlcheat.1, the $| is changed into '$' and a > vertical bar if the locale is UTF-8. > > real: | (U+0x7C) > output: â (U+0x2502) > > This also to a number of other characters, including the backward > apostrophe (accent grave) ` (0x60) which is transformed into â > (U+0x2018). This is very bad for copy+paste, and if your screen > font does not have all the UTF8 characters (especially the case on > bare 80x25 tty1 terminal), it does not even show any apostrophe, but > a block to indicate that 0x2018 is not available in this font. > > Is this a (big) bug in groff, or intention?
In usr/[local/]share/groff/<version>/font/devutf8/R you can see which output codes are used for which input characters. Looking into perlcheat.1, you can find this (converted on my platform with Pod::Man v 1.37): .tr \(*W-|\(bv\*(Tr The .tr request translates characters. In this particular case, it translates `|' to `\(bv'. `bv' is equivalent to `braceex' in PS output, and is by default mapped to U+23AA. I have no idea why you get U+2502 instead. And I have no idea why Pod:Man uses `bv' at all. Regarding the grave accent mapped to U+0x2018, here is the comment from groff_char(7): ` the ISO Latin-1 `Grave Accent' (code 96) prints as <U+2018>, a left single quotation mark; the original character can be obtained with `\`'. ' the ISO Latin-1 `Apostrophe' (code 39) prints as <U+2019>, a right single quotation mark; the original character can be obtained with `\(aq'. For typesetting this is the right choice, since those two character are used this way normally, similar to TeX. Distributions can overwrite this. For example, in my SuSE 9.1, I have this in /usr/share/groff/site-tmac/tmac.andocdb: .if '\*[.T]'utf8' \{\ . char \- \N'45' . char - \N'45' . char ' \N'39' .\} To summarize: . Mapping `|' to the `bv' entity is strange. If you use a plain `|' in a troff input file, you actually get a plain `|'! This looks like a bug in Pod::Man. . The ` and ' characters in groff input files always indicate left and right single quotation marks. U+0060 and U+0027 can be accessed as \` and \(aq. Ideally, this is fixed in Pod::Man too, if you use a `verbatim' mode, by translating those characters temporarily. Otherwise, as shown above, this can be changed in the configuration file of the man macros. Werner PS: Why the heck is `perlcheat.man' and all other non-program man pages of perl in man section 1?
_______________________________________________ Groff mailing list Groff@gnu.org http://lists.gnu.org/mailman/listinfo/groff