OK, I'm reading through different unicode related perldocs and have a
rather simple question.
Under perl version 5.8, does /(\w+)/ match UTF-8 characters without
calling any special pragma? I'm having a hard time finding something
that makes the statement that clearly.
I'm trying to parse out email content and it seems reasonable that I
could get characters in just about any conceivable format, from
ascii, latin, utf...
For simplicity I'm leaning in a direction of just converting everying
"up" to UTF8 and working all my string/regex manipulations on UTF.
So I'm trying to see if I can just use /(\w+)/ without worrying about
all this character encoding?
Or do I have to first Encode everything into UTF8?
And if so, before I Encode it, do I have to figure out what it is
first and then convert it from whatever encoding it is to UTF8?
For simplicity, it isn't necessarily a requirement that I can parse
the content into perfectly accurate words, but they have to be
completely repeatable and preferable fast.
help?
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/