Re: More character matching bits

Bryan C . Warnock Fri, 15 Jun 2001 03:32:05 -0700
On Thursday 14 June 2001 12:01 pm, Dan Sugalski wrote:
> Fancy character classes are probably enough to handle the various casing
> issues and their analogs. They're probably not enough to handle things
> like the arabic tatwheel, or proper word breaks in most asian languages.
> Heck, unless I'm missing something, they're insufficient for something as
> simple as \d.
>
> I'm not advocating forcing dictionaries into the regex engine, nor even
> shipping them with the core. 

That's not to say that some Locale::* couldn't include one, or reference a 
third party one.

> As I see it, locales specify:
>
>    * Collating order
>    * Comparison/equality specification
>    * Unicode codepoint interpretation

What do you mean by that?

>    * Regex character classes
>    * Regex character identification
>    * Regex zero-width assertion rules
>    * 'casing' rules
>
> It'd be nice to specify them all separately and inherit the ones you don't
> need to change from some parent locale.

Or have these individual bits and pieces be addressable through the regexen, 
and have locales *defined* via that.

module Locale::Hawaiian;
use re 'class (\w => [aeiouâêîôûhklmnpw`])';
...

On a side note (and this *will* sound stupid, but there is a reason I'm 
asking).  Why is there no logical opposite to '.'; that is, a character 
which never matches another character?  (Besides, of course, that it's 
utterly useless from a classic regex perspective.)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]
Re: More character matching bits

Reply via email to