On Sep 27, 12:27 pm, [EMAIL PROTECTED] wrote: > I am trying to use perl on the command line to process text files in > various ways, one of which is to decode html entities. As far as I can > see, the following line should work > > perl -MHTML::Entities -p -e 'decode_entities($_)' <input.txt > > >output.txt > > it does indeed change the html entities, but not into the required > characters, rather into pairs of unusual characters; and the command > line returns this: > > Wide character in print, <> line 1. > > It seems to me it is something to do with internal character encoding > being messed up but I can't work out how to control it.
Before you can control it you need to know what it is. >The text files > processed have MacOS character encoding which is required in the > finished file, What is "MacOS character encoding"? > but perhaps I need to convert to UTF8 before processing > and back again after? Perl will do this automatically if you tell it the encoding of the input and output. > perl -MHTML::Entities -p -e 'decode_entities($_)' <input.txt I think you need something like perl -MHTML::Entities -p -e "BEGIN { binmode STDIN, ':encoding(whatever)'; binmode STDOUT, ':encoding(whatever)' } decode_entities($_)" Where "whatever" is the name Perl uses for that which you are calling "MacOS character encoding". For a list of supported encodings: perldoc Encode::Supported -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/