reassign 348032 man-db retitle 348032 man foo > foo.txt gives invalid UTF-8 output with UTF-8 locales thanks
On Mon, Jan 16, 2006 at 09:59:54AM +0100, Norbert Preining wrote: > Hi all! > > > On Sat, Jan 14, 2006 at 06:30:28PM +0100, Christian Perrier wrote: > > > > They are all over the page. An example is that MARKERINGSTILLSTÅND FÖR > > > > PAKET becomes MARKERINGSTILLST???ND F???R PAKET. > > > > Yes, it seems that accented letters in section names are screwed up, > > reassigning to info. > > Before reassigning this to man I want to ask your opinion: > > Info calls man <section> <page> and shows the content of this. > > I tried it with a minimal file, containing only > > heder stuff > > .SH ÅTGÄRDER > > ÅTGÄRDER > > (in UTF8) > > and really the .SH line comes out wrong. > > BUT: If I call > > man foo > man.out > > I see a strange difference: > ÃÃ~ETGÃÃ~DRDER > Ã~ETGÃ~DRDER > > Note the difference between the initial > Ã~E .. > in the second line and the doubled > ÃÃ~E .. > in the first line. > > And in fact, loading the file into emacs (the man.out) with utf8 > encoding I see *only* in the second line the right characters, but the > first line is messed up! > > I don't know what man does and why it doubles the initial char, but the > output is not valid utf8, so info is not to blame. Hi Norbert, you are right, output is not valid utf8 when redirected into a file, reassigning to man-db. Here is another example: $ LC_ALL=en_US.UTF-8 man -L C man > man.out messes up some hyphens (0x2010) and apostrophs (0x2019); it looks like a multibyte sequence of n bytes is prepended by the (n-1) first bytes, but not for all occurences. Denis

