You can get the hex values from http://ascii-table.com/img/table-apple.gif
You can escape them with \xdd where dd is the 0xdd hex value.
eg
s/[\x80-\xFF]/\?/

On 3/20/07, Beginner <[EMAIL PROTECTED]> wrote:

Hi,

I have a large, 1.3GB xml file that I was trying to validate. It
turns out that the file has a lot of exotic characters in it such as:
é
è
Ä
È
...etc

The area of encoding and internationalisation is one I have no
experience of at all and from what I've heard it is rather complex
and difficult.

Being a lazy kidda guy, I though I would cat the file and let perl
make the substitiuations where it found any of these characters. My
problem is I am not sure how to regex for these characters except to
look for the hex value. Neither do I know of a way to escape/encode
them correctly.

I have seen the pragma utf8 but I am not sure my problem is what this
pragma was designed for. Does anyone have any suggestions for a
module or method that might take some of the pain out of detecting
and escaping such characters?

TIA,
Dp.

Reply via email to