At 07:21 AM 3/27/2001 -0800, Larry Wall wrote:
>Dan Sugalski writes:
>: Fair enough. I think there are some cases where there's a base/combining
>: pair of codepoints that don't map to a single combined-character code
>: point. Not matching on a glyph boundary could make things really odd, but
>: I'd hate to have the checking code on by default, since that'd slow down
>: the common case where the string in NFC won't have those.
>
>Assume that in practice most of the normalization will be done by the
>input disciplines. Then we might have a pragma that says to try to
>enforce level 1, level 2, level 3 if your data doesn't match your
>expectations. Then hopefully the expected semantics of the operators
>will usually (I almost said "normally" :-) match the form of the data
>coming in, and forced conversions will be rare.
The only problem with that is it means we'll be potentially altering the
data as it comes in, which leads back to the problem of input and output
files not matching for simple filter programs. (Plus it means we spend CPU
cycles altering data that we might not actually need to)
It might turn out that deferred conversions don't save anything, and if
that's so then I can live with that. And we may feel comfortable declaring
that we preserve equivalency in Unicode data only, and that's OK too.
(Though *you* get to call that one... :)
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk