At 01:10 PM 6/14/2001 +0200, Bart Lateur wrote:
>On Wed, 13 Jun 2001 13:39:16 -0400, Dan Sugalski wrote:
>
> >> > Something that should be part of the core? I'll leave
> >> >that for you to decide.
> >>
> >>Most definitely NOT.
> >
> >Most definitely sort of.
> >
> >>There is no reason to put fucntionality for free matching of Japanese
> >>characters into the basic perl executable.
> >
> >No, you're right. But the core must take into account the capabilities that
> >need to be available for comparison and matching of the languages perl's
> >going to make at least some effort to support.
>
>If you're saying that the perl core shsould include hooks into the regex
>engine for custom character classes, I agree. But nothing more.
Unfortunately we need more to do things properly.
Fancy character classes are probably enough to handle the various casing
issues and their analogs. They're probably not enough to handle things like
the arabic tatwheel, or proper word breaks in most asian languages. Heck,
unless I'm missing something, they're insufficient for something as simple
as \d.
I'm not advocating forcing dictionaries into the regex engine, nor even
shipping them with the core. We do need to provide the hooks, though, and
I'm not quite convinced that we've figured out what the hooks are quite yet.
There's also the issue of sorting and comparisons to deal with, but those
we can get away with a simple hook or two.
>Currently, Perl5 provides a hook for "use locale;", but I wish there was
>something more general than this, more customizable. For example, I
>sometimes have user defined character encodings, that don't follow any
>standard. I wish there was a simple, perl-only, way to cope with them.
Yeah, we need to have a wedge into the locale system for perl-level code.
>Also, for example, I would like be able to match "á" with /[a]/, but
>without changing the sort order. "locale" is a bit too much "all or
>nothing" for me.
As I see it, locales specify:
* Collating order
* Comparison/equality specification
* Uncode codepoint interpretation
* Regex character classes
* Regex character identification
* Regex zero-width assertion rules
* 'casing' rules
It'd be nice to specify them all separately and inherit the ones you don't
need to change from some parent locale.
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk