Re: More character matching bits

Dan Sugalski Mon, 11 Jun 2001 13:32:11 -0700
At 04:46 PM 6/11/2001 -0400, Buddha Buck wrote:
>At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote:
>>Dan Sugalski <[EMAIL PROTECTED]> writes:
>> > At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
>> >> Dan Sugalski <[EMAIL PROTECTED]> writes:
>>
>> >>> Should perl's regexes and other character comparison bits have an
>> >>> option to consider different characters for the same thing as
>> >>> identical beasts?  I'm thinking in particular of the Katakana/Hiragana
>> >>> bits of japanese, but other languages may have the same concepts.
>>
>> >> I think canonicalization gets you that if that's what you want.
>>
>> > I don't think canonicalization should do this. (I really hope not) This
>> > isn't really a canonicalization matter--words written with one character
>> > set aren't (AFAIK) the same as words written with the other, and which
>> > alphabet you use matters. (Which sort of argues against being able to do
>> > this, I suppose...)
>>
>>I guess I don't know what the definition of "the same thing" you're using
>>here is.
>
>I thought Dan was talking about something equivalent to the m//i 
>functionality.
>
>Would it, or should it, be possible to tell m// to treat Katakana 
>characters as the same as hiragana characters, in much the same way as 
>m//i treats UPPERCASE the same as lowercase?  Canonicalization won't get 
>you that.

Yup, that's pretty much it in a nutshell. This may end up being a 
Japanese-only thing, in which case it may not be worth much effort, but it 
seems as useful in some cases as the case-insensitivity we do for other 
character sets.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Re: More character matching bits

Reply via email to