On Thu, Oct 02, 2008 at 03:59:45PM -0700, Russ Allbery wrote:
> Niko Tyni <[EMAIL PROTECTED]> writes:

> > So the output is ISO-8859-1 where possible and UTF-8 elsewhere.

> > Russ, I think the binmode($output, ":utf8") really belongs in pod2man
> > instead of Pod::Man.
> 
> It turns out, at least based on the experiments that I did, that you never
> want to use an encoding of :utf8.  What this does is tell Perl to just
> dump its internal encoding to the file handle rather than applying any
> encoding.  The only supported thing you can do with that byte stream is to
> read it back in via another file handle using the :utf8 encoding.  It is
> *not* necessarily valid UTF-8, and in practice I was getting all sorts of
> really strange things from it when looking at it via something other than
> Perl.
> 
> You always want to use :encoding(utf-8) instead if the output is for
> anything other than Perl.

I see. The Perl internal encoding is UTF-8, but there are ways to get
invalid UTF-8 in there, for example by using :utf8 on binary input.
This invalid UTF-8 will then be output as-is with if :utf8 is set
on output.

I can't really think of a case where setting :encoding(utf-8) on output
does the right thing but :utf8 doesn't. It does turn the output into valid
UTF-8, but do you have an example where the content is not gibberish?

On the input side, :encoding(utf-8) is indeed probably the better
choice because it will croak when it encounters invalid bytes.

> > Users of Pod::Man should do that themselves for their output file handle
> > when they use the 'utf8' option. (This needs documentation, of course.)
> 
> I'm not sure I like this as an interface since Pod::Man's supported
> interface involves opening the files itself.  This would mean that anyone
> who wants Unicode output can't use the API of Pod::Man and Pod::Text that
> have been supported for years.  I'd really rather try to transparently
> support Unicode using the existing API, even if it means messing with the
> state of provided output file handles.

How about providing your own parse_from_file() wrapper in Pod::Man that
knows about the utf8 option, does the open() and then sets the binmode?
I don't think there's any need to touch the filehandles of people using
parse_file().

> > However, pod2man currently uses the parse_from_file() method, which is
> > just a compatibility wrapper in Pod::Simple that does the open() and
> > output_fh() calls. I suppose this should go in pod2man itself.
> > Something like the attached patch might do, although I see there's some
> > deeper magic in Pod::Simple.
> 
> This patch looks fine to me as a workaround, although I think my previous
> patch is the better long-term fix.

OK. I'll use this (with :encoding(utf-8)) for lenny if no further
showstoppers come up.
 
> Note that Pod::Text has related issues; try running pod2text on your same
> sample POD file and you'll see that it produces warnings about wide
> characters as well.  I'm not sure if that's worth trying to tackle for
> lenny, though (it affects perldoc -t).

Yes, I think we should leave that alone at this point.
-- 
Niko Tyni   [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to