Hi Ingo, > > Due to some, all?, man renderers trying to keep a shell backquote as > > a paste-able backquote, for example. > > > > .\" For UTF-8, map some characters conservatively for the sake > > .\" of easy cut and paste. > > . > > .if '\*[.T]'utf8' \{\ > > . rchar \- - ' ` > > . > > . char \- \N'45' > > . char - \N'45' > > . char ' \N'39' > > . char ` \N'96' > > .\} > > Exactly. Which reinforces my point that you have to use \(oq to get a > left single quote in man(7).
But is that because the `.char' above are hiding faults in man pages rather than leaving the pressure there for them to be fixed upstream? The man page source is troff and so `' should be usable in English prose. The more noisy escapes should only be needed for the odd bit of verbatim computer reproduction. With the above .char in the system's an-old.tmac, I get $ grep -w backsl foo.1 `ascii' \`backsl\` \(gaga\(ga \(oqoq-cq\(cq \(aqaq\(aq $ $ for t in ascii latin1 utf8; do > man -T$t ./foo.1 > done | > grep -w backsl | > uniq -c 2 `ascii' `backsl` `ga` `oq-cq' 'aq' 1 `ascii' `backsl` `ga` ‘oq-cq’ 'aq' $ Copying an-old.tmac to $HOME so it get picked up first, and deleting the above .char does not change -Tutf8's output. I don't know why not. Removing the .rchar too does change the ASCII left quote's rendering, but not the right. ‘ascii' `backsl` `ga` ‘oq-cq’ 'aq' > > Whom is this change is meant to benefit? I've lost track. > > People reading roff(7) documents with nroff(1) or man(1) in a terminal > window while they have LC_CTYPE=C set and while they are using a > modern font. Colin pointed out that remote machines may not support his locale so he's forced into LC_CTYPE=C sometimes. However, that's presumably just for the odd bit of command-line work as lack of UTF-8 could affect might more than just reading a man page given non-ASCII in source comments, collating order and multi-byte sequences affecting searching, etc. > > Could it be those that will see ASCII output in practice align with > > those that are happy to stick with seeing «`'»? ... > Besides, LC_CTYPE is not merely a personal choice, but there are > technical reasons to sometimes use a UTF-8 LC_CTYPE (for example when > working on UTF-8-encoded natural language text files from the shell > with basic POSIX tools) and for the same person to use LC_CTYPE=C in > different contexts (for example when working on a build system). The > latter situation is what caused people to repeatedly report what they > perceived as "the `quoting' oddity" to me in the past: people who > normally use UTF-8 but sometimes switch to LC_CTYPE=C for specific > tasks (like Ted Unangst or Anthony Bentley, if i understand > correctly). I normally use UTF-8. I have ~/bin/C that does LC_ALL=C LANG=C exec -- "$@" to run particular commands in that locale, e.g. for speed. I think if I switched wholesale to the C locale for a terminal or session then I would accept seeing `foo' rather than 'foo' as an attribute of that locale rather than trying to force it to look like Unicode. -- Cheers, Ralph.