OK, I'm reading through different unicode related perldocs and have a rather simple question.

Under perl version 5.8, does /(\w+)/ match UTF-8 characters without calling any special pragma? I'm having a hard time finding something that makes the statement that clearly.

I'm trying to parse out email content and it seems reasonable that I could get characters in just about any conceivable format, from ascii, latin, utf...

For simplicity I'm leaning in a direction of just converting everying "up" to UTF8 and working all my string/regex manipulations on UTF.

So I'm trying to see if I can just use /(\w+)/ without worrying about all this character encoding?
Or do I have to first Encode everything into UTF8?
And if so, before I Encode it, do I have to figure out what it is first and then convert it from whatever encoding it is to UTF8?

For simplicity, it isn't necessarily a requirement that I can parse the content into perfectly accurate words, but they have to be completely repeatable and preferable fast.

help?

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to