On Thu, Oct 02, 2008 at 03:59:45PM -0700, Russ Allbery wrote: > Niko Tyni <[EMAIL PROTECTED]> writes:
> > So the output is ISO-8859-1 where possible and UTF-8 elsewhere. > > Russ, I think the binmode($output, ":utf8") really belongs in pod2man > > instead of Pod::Man. > > It turns out, at least based on the experiments that I did, that you never > want to use an encoding of :utf8. What this does is tell Perl to just > dump its internal encoding to the file handle rather than applying any > encoding. The only supported thing you can do with that byte stream is to > read it back in via another file handle using the :utf8 encoding. It is > *not* necessarily valid UTF-8, and in practice I was getting all sorts of > really strange things from it when looking at it via something other than > Perl. > > You always want to use :encoding(utf-8) instead if the output is for > anything other than Perl. I see. The Perl internal encoding is UTF-8, but there are ways to get invalid UTF-8 in there, for example by using :utf8 on binary input. This invalid UTF-8 will then be output as-is with if :utf8 is set on output. I can't really think of a case where setting :encoding(utf-8) on output does the right thing but :utf8 doesn't. It does turn the output into valid UTF-8, but do you have an example where the content is not gibberish? On the input side, :encoding(utf-8) is indeed probably the better choice because it will croak when it encounters invalid bytes. > > Users of Pod::Man should do that themselves for their output file handle > > when they use the 'utf8' option. (This needs documentation, of course.) > > I'm not sure I like this as an interface since Pod::Man's supported > interface involves opening the files itself. This would mean that anyone > who wants Unicode output can't use the API of Pod::Man and Pod::Text that > have been supported for years. I'd really rather try to transparently > support Unicode using the existing API, even if it means messing with the > state of provided output file handles. How about providing your own parse_from_file() wrapper in Pod::Man that knows about the utf8 option, does the open() and then sets the binmode? I don't think there's any need to touch the filehandles of people using parse_file(). > > However, pod2man currently uses the parse_from_file() method, which is > > just a compatibility wrapper in Pod::Simple that does the open() and > > output_fh() calls. I suppose this should go in pod2man itself. > > Something like the attached patch might do, although I see there's some > > deeper magic in Pod::Simple. > > This patch looks fine to me as a workaround, although I think my previous > patch is the better long-term fix. OK. I'll use this (with :encoding(utf-8)) for lenny if no further showstoppers come up. > Note that Pod::Text has related issues; try running pod2text on your same > sample POD file and you'll see that it produces warnings about wide > characters as well. I'm not sure if that's worth trying to tackle for > lenny, though (it affects perldoc -t). Yes, I think we should leave that alone at this point. -- Niko Tyni [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]