On 24 October 2014, Ingo Schwarze <[email protected]> wrote:
> Hi,
> 
> Liviu Daia wrote on Fri, Oct 24, 2014 at 08:37:31AM +0300:
> > On 24 October 2014, Ingo Schwarze <[email protected]> wrote:
> >> Gleydson Soares wrote on Thu, Oct 23, 2014 at 09:11:36PM -0300:
> >>> On Thu, Oct 23, 2014 at 10:36:44AM -0300, Gonzalo L. Rodriguez wrote:
> 
> >>>> -USE_GROFF =             Yes
> 
> >>> mandoc conplains:
> >>>
> >>> $ mandoc -Tlint -Werror stunnel.8       
> >>> mandoc: stunnel.8:35:2: ERROR: skipping unknown macro: 'br\&
> >>> mandoc: stunnel.8:85:37: ERROR: skipping bad character: 0xc2
> >>> mandoc: stunnel.8:85:38: ERROR: skipping bad character: 0xa0
> >>> mandoc: stunnel.8:1084:11: ERROR: skipping bad character: 0xc5
> >>> mandoc: stunnel.8:1084:12: ERROR: skipping bad character: 0x82
> >>> mandoc: stunnel.8:1085:16: ERROR: skipping bad character: 0xc5
> >>> mandoc: stunnel.8:1085:17: ERROR: skipping bad character: 0x82
> >>> $
> >>> 
> >>> are you sure to zap groff?
> 
> >> Yes, it's a perlpod(1) manual, and these particular errors are
> >> harmless.
> >> 
> >>  - 35:2 has no ill effect, actually, it's bug in mandoc(1) that
> >>         this bogus message is shown, i will look into fixing it.
> >>  - 85:37-38 is merely a bug in the manual,
> >>             two stray gibberish eight bit bytes
> 
> >     Not really, 0xC2 0xA0 is Unicode "NO-BREAK SPACE":
> > 
> >         http://www.fileformat.info/info/unicode/char/a0/index.htm
> > 
> >     There are probably more of these around,
> 
> No kidding.
> 
> > various *roff tools produce them.
> 
> Really?  Hopefully not.  If you run into tools doing that, please
> do report them to me.  I am willing to hunt those bugs down and
> talk to the upstream maintainers of such broken tools.
> 
> In the case at hand, you can claim for sure that Russ Albery's
> pod2man(1) and David Wheeler's Pod::Simple are excessively complicated,
> but they are not broken in this respect.  They produce correct
> output by default.

    Hello?  I was referring to non-breakable space.  I was just pointing
out that you can expect non-breakable space characters to creep into
man pages simply because a lot of manual pages these days are actually
converted from other formats.

    I never claimed anything about stunnel, pod2man, groff, generic
UTF-8 characters in *roff, Heirloom, Solaris, plan9, or the translation
of Zarathustra's collected works in Swahili. :) I just humbly pointed
out that you'll probably stumble upon other non-breakable spaces, and
that dealing with them (say by replacing them with normal spaces) might
be a more energy- and time-efficient approach then posting a tirade
against the authors of said man pages every time you run into that
problem.

    Regards,

    Liviu Daia

> The problem here is that the stunnel(8) maintainers don't know what
> they are doing.  In Makefile.in, they pass the -u option (use UTF-8
> in the generated roff(7) code) to pod2man(1), even though the manual
> explicitly states "Many *roff implementations cannot handle non-ASCII
> characters".  That is a massive understatement.  I do not know of
> any implementation of roff(7) that can handle that.  Definitely
> no version of groff or mandoc ever could, and the next future
> releases of these two (groff-1.22.3 and mandoc-1.13.2) will not be
> able to do it, either.  It is planned for mandoc, but work hasn't
> started yet.  There are certainly no plans to support that in groff,
> or i would have heard of it.  If you find *any* implementation of
> roff(7) that can handle UTF-8 *input* without running a recoder
> like preconv(1) first, i'd be glad to hear that.
> 
> Now you might maybe argue that the stunnel(8) maintainers assume
> everybody has preconv(1) available.  Strange assumption, as far as
> i can tell, that's groff and mandoc only, and it works badly at
> best for both of them.  And even if stunnel(8) exclusively targets
> groff, it's not up to the job:
> 
>    $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tps \
>        > stunnel.ps
>   <standard input>:85: warning: can't find special character `u00A0'
> 
>  ... and the resulting PostScript file has "-fdN" without a blank
> in the SYNOPSIS line.
> 
>    $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tascii
>    $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tlatin1
> 
> don't give you the blank, either, even though it's seemingly easy
> enough to translate a blank to ASCII.
> 
> By the way, even pod2man(1) itself is unable to properly handle
> UTF-8 input.  If you do *not* give -u, there is not attempt to
> encode non-ASCII characters into roff(7) escape sequences, they are
> just replaced with "X" characters.  And i can't blame pod2man(1),
> it's completely unclear what it should do.  If i remember correctly,
> last time i looked, i found four different ways to write UTF-8
> escape sequences in the following three roff(7) implementations:
> groff, Heirloom/Solaris and plan9.  None of these escape syntaxes
> worked for more than one implementation; groff has two alternative
> syntaxes exhibiting a few very subtle, probably unintended differences
> in the output produced.  Anything that exists is utterly non-portable.
> 
> So the only sane way i can see for manuals of portable software is
> to not use any kind of non-ASCII characters, but instead do ASCII
> transliterations for author names by hand when writing the manuals,
> and most importantly *never* use pod2man(1) -u because that breaks
> more than just UTF-8 characters.  It also breaks spacing.
> 
> Yes, this is a mess, and at some point, i need to attack this maze
> of problems.  But it is complex.  Cleaning up errno handling in
> src/lib/libc/rpc and src/lib/libc/yp is a simpler task.
> 
> Yours,
>   Ingo

Reply via email to