Re: Unicode handling

Dan Sugalski Tue, 27 Mar 2001 09:05:48 -0800
At 07:21 AM 3/27/2001 -0800, Larry Wall wrote:
>Dan Sugalski writes:
>: Fair enough. I think there are some cases where there's a base/combining
>: pair of codepoints that don't map to a single combined-character code
>: point. Not matching on a glyph boundary could make things really odd, but
>: I'd hate to have the checking code on by default, since that'd slow down
>: the common case where the string in NFC won't have those.
>
>Assume that in practice most of the normalization will be done by the
>input disciplines.  Then we might have a pragma that says to try to
>enforce level 1, level 2, level 3 if your data doesn't match your
>expectations.  Then hopefully the expected semantics of the operators
>will usually (I almost said "normally" :-) match the form of the data
>coming in, and forced conversions will be rare.

The only problem with that is it means we'll be potentially altering the 
data as it comes in, which leads back to the problem of input and output 
files not matching for simple filter programs. (Plus it means we spend CPU 
cycles altering data that we might not actually need to)

It might turn out that deferred conversions don't save anything, and if 
that's so then I can live with that. And we may feel comfortable declaring 
that we preserve equivalency in Unicode data only, and that's OK too. 
(Though *you* get to call that one... :)

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Re: Unicode handling

Reply via email to