reassign 348032 man-db
retitle 348032 man foo > foo.txt gives invalid UTF-8 output with UTF-8 locales
thanks

On Mon, Jan 16, 2006 at 09:59:54AM +0100, Norbert Preining wrote:
> Hi all!
> 
> > On Sat, Jan 14, 2006 at 06:30:28PM +0100, Christian Perrier wrote:
> > > > They are all over the page. An example is that MARKERINGSTILLSTÅND FÖR
> > > > PAKET becomes MARKERINGSTILLST???ND F???R PAKET.
> > 
> > Yes, it seems that accented letters in section names are screwed up,
> > reassigning to info.
> 
> Before reassigning this to man I want to ask your opinion:
> 
> Info calls man <section> <page> and shows the content of this.
> 
> I tried it with a minimal file, containing only
> 
> heder stuff
> 
> .SH ÅTGÄRDER
> 
> ÅTGÄRDER
> 
> (in UTF8)
> 
> and really the .SH line comes out wrong.
> 
> BUT: If I call 
> 
> man foo > man.out
> 
> I see a strange difference:
> ÃÃ~ETGÃÃ~DRDER
>        Ã~ETGÃ~DRDER
> 
> Note the difference between the initial
>       Ã~E ..
> in the second line and the doubled
>       ÃÃ~E ..
> in the first line.
> 
> And in fact, loading the file into emacs (the man.out) with utf8
> encoding I see *only* in the second line the right characters, but the
> first line is messed up!
> 
> I don't know what man does and why it doubles the initial char, but the
> output is not valid utf8, so info is not to blame.

Hi Norbert,

you are right, output is not valid utf8 when redirected into a file,
reassigning to man-db.  Here is another example:
  $ LC_ALL=en_US.UTF-8 man -L C man > man.out
messes up some hyphens (0x2010) and apostrophs (0x2019); it looks
like a multibyte sequence of n bytes is prepended by the (n-1) first
bytes, but not for all occurences.

Denis

Reply via email to