Re: still working with utf8

Tom Phoenix Thu, 21 Jun 2007 19:56:56 -0700

On 6/21/07, Tom Allison <[EMAIL PROTECTED]> wrote:

I guess my question is, for CJK languages, should I expect the notion
of using a regex like \w+ to pick up entire strings of text instead
of discrete words like latin based languages?


Once you've enabled what the perlunicode manpage calls "Character
Semantics", it says:

   Character classes in regular expressions match characters instead
   of bytes and match against the character properties specified in
   the Unicode properties database.  "\w" can be used to match a
   Japanese ideograph, for instance.

   http://perldoc.perl.org/perlunicode.html

Does that manpage get you any closer to a solution? Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: still working with utf8

Reply via email to