Re: More character matching bits

Buddha Buck Mon, 11 Jun 2001 13:20:34 -0700
At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote:
>Dan Sugalski <[EMAIL PROTECTED]> writes:
> > At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
> >> Dan Sugalski <[EMAIL PROTECTED]> writes:
>
> >>> Should perl's regexes and other character comparison bits have an
> >>> option to consider different characters for the same thing as
> >>> identical beasts?  I'm thinking in particular of the Katakana/Hiragana
> >>> bits of japanese, but other languages may have the same concepts.
>
> >> I think canonicalization gets you that if that's what you want.
>
> > I don't think canonicalization should do this. (I really hope not) This
> > isn't really a canonicalization matter--words written with one character
> > set aren't (AFAIK) the same as words written with the other, and which
> > alphabet you use matters. (Which sort of argues against being able to do
> > this, I suppose...)
>
>I guess I don't know what the definition of "the same thing" you're using
>here is.

I thought Dan was talking about something equivalent to the m//i functionality.

Would it, or should it, be possible to tell m// to treat Katakana 
characters as the same as hiragana characters, in much the same way as m//i 
treats UPPERCASE the same as lowercase?  Canonicalization won't get you that.

My feeling is that the hooks should be there, but the specific equivalence 
mappings should be in the library, not the core.
Re: More character matching bits

Reply via email to