> The groff_char.7 documentation also lists the backquote u0040 and > the apostrophe u0027.
Ah, yes. Silly me. > Adding this to unicode.tmac fixes these: > > .char ` \[oq] > .char ' \[cq] OK. > IIRC, in the C++ code, the font handling for the html and utf8 > devices is the same. Therefore I tried to add to html.tmac: > .mso unicode.tmac > and this fixes it! OK, done. > One problem is still left: What is now the recommended way to write a > shell command line, in a way that is copy&pastable from at least the utf8 > and html outputs? > - If I write "foo --help" in the utf8 output we get twice u2010. > - If I write "foo \-\-help" in the utf8 output we get twice u2212. A very good question. The standard solution is described in the PROBLEMS file: * The UTF-8 output of grotty has strange characters for the minus, the hyphen, and the right quote. Why? The used Unicode characters (U+2212 for the minus sign and U+2010 for the hyphen) are the correct ones, but many programs can't search them properly. The same is true for the right quote (U+201D). To map those characters back to the ASCII characters, insert the following code snippet into the `troffrc' configuration file: .if '\*[.T]'utf8' \{\ . char \- \N'45' . char - \N'45' . char ' \N'39' .\} However, this is an ugly hack and doesn't solve the very issue. With the current means this problem is unsolvable, I believe. > - If I write "foo \[u002D]\[u002D]help" then in the utf8 output > we get twice u002D, as desired, but in the html processing I get > "warning: can't find special character `u002D'". Hmm?? > > It took me already some effort to convince the Linux manpages > maintainer that \- should be used for copy&pastable commands in > manpages. Do I have to recommend him to use \[u002D] now instead? Using \[u002D] doesn't work with the latin1 device... I see two possible solutions. . Define a new grotty (pseudo) font `CR' which is the same as all other fonts but contains an additional line \- 24 0 0x002D This is the solution which Gaius has implemented for grohtml already (however, he always uses 0x002D for \-, not only for fixed-width fonts -- something which should probably be changed). I can imagine that most man pages already use \f[CR] for displaying verbatim stuff (groff man pages being a notable exception), so this should be rather straightforward. . Introduce a new escape, say, `\=', which maps to U+002D. We would thus have - U+2010 \- U+2212 \= U+002D Alternatively, we could exchange the meaning of \- and \=, having - U+2010 \- U+00AD \= U+2212 Sigh. How do other applications solve this mess? Werner