On Wed, Nov 10, 2010 at 01:03:26PM -0500, Chase Albert wrote:
> Sorry if this is the wrong forum. I was wondering if there was a way to
> specify unicode
> categories<http://www.fileformat.info/info/unicode/category/index.htm>in
> a regular expression (and hence a grammar), or if there would be any
> consideration for adding support for that (requiring some kind of special
> syntax).
Unicode categories are done using assertion syntax with "is" followed by
the category name. Thus <isLu> (uppercase letter), <isNd> (decimal digit),
<isZs> (space separator), etc.
This even works in Rakudo today:
$ ./perl6
> say 'abcdEFG' ~~ / <isLu> /
E
They can also be combined, as in +isLu+isLt (uppercase+titlecase).
The relevant section of the spec is in Synopsis 5; search for "Unicode
properties are always available with a prefix".
Hope this helps!
Pm