Hi, On Fri, Aug 10, 2007 at 01:23:02PM +0200, Adam Borowski wrote: > On Fri, Aug 10, 2007 at 11:24:08AM +0100, David Given wrote: > > Ben Finney wrote: > > [...] > > > That sounds like a bug. I was under the impression that the default > > > encoding of everything in lenny was supposed to be UTF-8.
I wish. Many documentations are still in old encodings... > > > What tool is it that has this different default encoding? > > > > Well, I tried UTF-8 with the assumption that it would work, and it threw up > > a ... > I would call this a bug, in Etch it was "only" "important". > ANY file on a modern system installed by the distribution (and not in the > user's private data, /mnt/win/ or an upstream source tarball) is bad for a > number of reasons, mangling people's surnames being one of less important > ones. > > All data files should be in UTF-8 (or UCS4, or any other format which does > not include data loss). If an user then chooses to use a broken charset due > to his/her historic preferences, so be it -- but you cannot inflict data > loss on others. If man-db does this, it needs to be beaten with a large > cluestick. I think the maintainer of man-db is well aware and has more than enough "clue". (The satatement like above without checking the fact is nothing but arrogance and should be avoided to be a good debian volunteer.) If you have time and skill, please provide patch and exact transition plan to the BTS. To me, it looks like Colin has tools getting ready. As I see changelog ... > Thu Aug 10 17:23:03 BST 2006 Colin Watson <[EMAIL PROTECTED]> > > * src/encodings.c (get_default_device): Always use utf8 if preconv > is available. > (get_roff_encoding): Skip CJK UTF-8 hack if preconv is available. > * src/man.c (make_roff_command): Use preconv if available to recode > input even if the encoding is detected by means other than looking > at the preprocessor line. Skip iconv preprocessing in that case. The current text data may use non-UTF-8 but the tool is internally running with UTF-8 data. (I did not check the source any further the above. I vaguely remember that Colin posted something about UTF-8 transition plan before) Thanks. Osamu PS: Please be reminded that even UTF-8 encoded text data which can only access UCS codes is not without "data loss". The selection of UCS codes for glyphs was a practical compromize. They assigned a same code to several glyphs sharing some history. (This is mostly Chinese-Japanese-Korean issue which have huge number of glyphs.)
signature.asc
Description: Digital signature