On 6/21/07, Tom Allison <[EMAIL PROTECTED]> wrote:
I guess my question is, for CJK languages, should I expect the notion of using a regex like \w+ to pick up entire strings of text instead of discrete words like latin based languages?
Once you've enabled what the perlunicode manpage calls "Character Semantics", it says: Character classes in regular expressions match characters instead of bytes and match against the character properties specified in the Unicode properties database. "\w" can be used to match a Japanese ideograph, for instance. http://perldoc.perl.org/perlunicode.html Does that manpage get you any closer to a solution? Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/