On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote:
> At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote:
> >On Fri, Dec 15, 2000 at 12:13:01PM +0000, Simon Cozens wrote:
> > > IMHO, the first thing we need to design and code is the API and runtime
> > > library, since everything else builds on top of that, and we can design 
> > other
> > > stuff in parallel with coding it. (A lot of it will be grunt work.)
> > >
> > > So, before we start even thinking about what we need, it's time to look 
> > at the
> > > vexed question of string representation. How do we do Unicode without 
> > getting
> > > into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry
> >
> >As painful as it may sound (codingwise) I would urge to spare some
> >thought to using (internally) UTF-32 for those encodings for which
> >UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).
> 
> If we can manage it, I'd prefer to not have a preferred internal 

I didn't mean 'preferred', I meant that if UTF-8 would be longer for
some encodings, both for space *and* speed using straight honest UTF-32
would make much more sense.

> representation and Do The Right Thing in a general way. (Though I know that 
> we may have to go more specific for speed)
> 
> I can see us having good reason to handle at least:
> 
> Binary
> UTF-8 (and yes, I know latin-1, or ASCII, or something of the sort is a 
> proper subset of UTF-8)
> EBCDIC
> UTF-32
> Shift-JIS
> 
> as text. How to generalize the regex engine (which strikes me as the most 
> likely piece of perl to care deeply about representation) to handle all the 
> types is an interesting question. I'm currently trying to figure out a way 
> to generalize things, and it's mostly there, but I'm really worried about 
> speed issues because of it.
> 
> Worst case, handling bytes and UTF-32 should get us by, (variable-lenth 
> encodings are a *pain*...) though we'd be well-served to handle more natively.

EMPHATIC YES (after glaring for weeks at the regex/utf8 code).

> 
>                                       Dan
> 
> --------------------------------------"it's like this"-------------------
> Dan Sugalski                          even samurai
> [EMAIL PROTECTED]                         have teddy bears and even
>                                       teddy bears get drunk

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Reply via email to