You can get the hex values from http://ascii-table.com/img/table-apple.gif You can escape them with \xdd where dd is the 0xdd hex value. eg s/[\x80-\xFF]/\?/
On 3/20/07, Beginner <[EMAIL PROTECTED]> wrote:
Hi, I have a large, 1.3GB xml file that I was trying to validate. It turns out that the file has a lot of exotic characters in it such as: é è Ä È ...etc The area of encoding and internationalisation is one I have no experience of at all and from what I've heard it is rather complex and difficult. Being a lazy kidda guy, I though I would cat the file and let perl make the substitiuations where it found any of these characters. My problem is I am not sure how to regex for these characters except to look for the hex value. Neither do I know of a way to escape/encode them correctly. I have seen the pragma utf8 but I am not sure my problem is what this pragma was designed for. Does anyone have any suggestions for a module or method that might take some of the pain out of detecting and escaping such characters? TIA, Dp.