On Mon, Jan 14, 2002 at 09:32:02AM +0100, Holger Rauch wrote:
[...]
> My problem was that I wanted to substitute element content in XML files by
> entity references. These entity references are referring to the values of
> a Java .properties file that's used for i18n of our Java software. In
> order to properly parse the content of an element, I need to take
> different encodings into account (based on the value of the
> "encoding" attribute of the XML declaration). As far as I understand, the
> two Perl modules are able to set the encoding for individual strings. I'm
> wondering whether there is a module that allows me to set the encoding for
> a filehandle?

I'm still not sure I understand exactly what you need here, but I think that
you:

1. Have data in various different encodings.
2. You need to manipulate the data and write it out in the same (different?)
   encoding.

Alas there seems to be no easy way to convert charsets within Perl. To quote
perlunicode(1):

> Input and Output Disciplines There is currently no easy way to mark data
> read from a file or other external source as being utf8.  This will be one
> of the major areas of focus in the near future.

But, there other programs that can help. There is a great set of tools
available at:

        http://www.whizkidtech.net/i18n/

Not perl but perhaps you can call from a shell script or (if you are willing
to get into the guts of perl) you could make a module via the Perl C
interface. Anyway at this site are 'uhtrans' which converts non-ASCII chars
in UTF-8 input to as escaped form as in HTML ('&#codenum;') and the
corresponding program 'hutrans' reverses this. It is also possible to convert
between ISO-8859-* and UTF-8 using iconv, which is part of libc6.

-- 
Henry House
The attached file is a digital signature. See <http://romana.hajhouse.org/pgp>
for information.  My OpenPGP key: <http://romana.hajhouse.org/hajhouse.asc>.

Attachment: pgpP5WQj3ut6r.pgp
Description: PGP signature

Reply via email to