Re: More character matching bits

Russ Allbery Mon, 11 Jun 2001 12:52:07 -0700
Dan Sugalski <[EMAIL PROTECTED]> writes:
> At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
>> Dan Sugalski <[EMAIL PROTECTED]> writes:

>>> Should perl's regexes and other character comparison bits have an
>>> option to consider different characters for the same thing as
>>> identical beasts?  I'm thinking in particular of the Katakana/Hiragana
>>> bits of japanese, but other languages may have the same concepts.

>> I think canonicalization gets you that if that's what you want.

> I don't think canonicalization should do this. (I really hope not) This
> isn't really a canonicalization matter--words written with one character
> set aren't (AFAIK) the same as words written with the other, and which
> alphabet you use matters. (Which sort of argues against being able to do
> this, I suppose...)

I guess I don't know what the definition of "the same thing" you're using
here is.

>> I definitely think that Perl should be able to do all of NFD, NFC,
>> NFKD, and NFKC canonicalization.

> C & D at least. KC & KD are doable as well, though I'm not sure when
> you'd want them.

USEFOR is looking at requiring NFKC canonicalization for newsgroup names,
to get rid of some of the odder stuff, so Usenet code will potentially
need it.  I believe the DNS folks are also looking at it for IDN.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>
Re: More character matching bits

Reply via email to