On Fri, Oct 12, 2007 at 08:51:31AM +0900, Junichi Uekawa wrote: > Hi, > > o Per-locale directory handling has been improved. Directories such > > as "fr.UTF-8" may be used for occasions when it is appropriate to > > specify the character set but not the country, and so a full > > locale name is inconvenient. > > > > o There is a new "manconv" program which can try multiple possible > > encodings for a file, thus allowing UTF-8 manual pages to be > > installed in any directory even without an explicit encoding > > declaration. > > This is cool. > > A great workaround for that compatibility mess RedHat has created for US. > > I assume UTF-8 / local-encoding detection can fail sometimes; which > encoding has precedence?
You're right, it can. It's much more likely that a random non-UTF-8 document will fail to decode as UTF-8 than the other way round, so man tries UTF-8 first and that will take precedence. I did just notice a bug in manconv's detection which I've fixed for 2.5.1. With that bug fixed, the only circumstances in which a page will be decoded incorrectly should be if it is not valid UTF-8 but contains some text which looks like valid UTF-8 in the first 64KB. I don't know of an example of this happening in practice. The only hard case you get in practice is a very large mostly-ASCII page with some ISO-8859-1 near the end (maybe in an author's name), and manconv handles that fine. However, if there is still ambiguity due to this, you can either install the page in a directory name that's explicitly tagged with an encoding (another reason I'd like to do that by default, as otherwise we get a few pages that are put there anyway to disambiguate) or use a coding: declaration in the file. This is documented in manconv(1). Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]