Re: ICU incorporation and string changes heads-up

2004-04-11 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote: > On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: >> - What happenend to external constant strings? > They should still work (or could). But the only cases in which we can > optimize, and actually use "in-place" a buffer handed to string_make, > is for a

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Larry Wall
On Sat, Apr 10, 2004 at 01:19:39PM +0300, Jarkko Hietaniemi wrote: : I'm no Larry, either :-) but I think Larry is *not* saying that the : "localeness" or "languageness" should hang off each string (or *shudder* : off each substring). What I've seen is that Larry wants the "level" to : be a lexica

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jeff Clites
On Apr 10, 2004, at 12:21 PM, Jeff Clites wrote: On Apr 10, 2004, at 3:54 AM, Leopold Toetsch wrote: Ok. I want to uppercase the strings - no sorting (yet). I've an array of Vienna's Kebab boothes. Half of these have turkish names (at least) the rest is a mixture of other languages. I'd like to

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jeff Clites
On Apr 10, 2004, at 3:54 AM, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: On Apr 10, 2004, at 1:12 AM, Leopold Toetsch wrote: use German; print uc("i"); use Turkish; print uc("i"); Perfect example. The string "i" is the same in each case. What you've done is implicit

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
> Ok. Now when the identical string "i" (but originating from different > locale environmets) goes through a sequence of string operations later, > how do you track the locale down to the final C where it's needed? > > e.g. > > use German; > my $gi = "i"; > use Turkish; > my $ti =

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote: > On Apr 10, 2004, at 1:12 AM, Leopold Toetsch wrote: >>use German; >>print uc("i"); >>use Turkish; >>print uc("i"); > Perfect example. The string "i" is the same in each case. What you've > done is implicitly supplied a locale argument to th

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
> Another example could be that at level 2 (and 3), maybe "eq" > automatically normalizes before doing string comparisons, and at levels > 1 and 0 it doesn't. Exactly. People wanted implicit "eq" normalization for Perl 5 Unicode. The problem always is "where does it end?", because the logica

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jeff Clites
On Apr 10, 2004, at 3:19 AM, Jarkko Hietaniemi wrote: We'll basically need 4 levels of string support: ,--[ Larry Wall ] | level 0 byte == character, "use bytes" basically | level 1 codepoint == character, what we seem to be aiming for, va

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
> So the first question is: Where is this higher level? Isn't Parrot > responsible for providing that? The old string type did have the > relevant information at least. > > I think we can't say it's a Perl6 lib problem. HLL interoperability Right. It's a Parrot lib problem. But it's not a ".c/.

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
>>We'll basically need 4 levels of string support: >> >>,--[ Larry Wall >>] >>| level 0byte == character, "use bytes" basically >>| level 1codepoint == character, what we seem to be aiming for, >>vaguely >>| level 2grapheme

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jeff Clites
On Apr 10, 2004, at 2:40 AM, Leopold Toetsch wrote: Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: Not used *yet* - what about: use German; print uc("i"); use Turkish; print uc("i"); That is implementable (and already implemented by ICU) but by something higher level than a "string".

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jeff Clites
On Apr 10, 2004, at 1:12 AM, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: So internally, strings don't have an associated encoding (or chartype or anything) How do you handle EBCDIC? I'll use pseudo-C to illustrate: string = str

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Leopold Toetsch
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: >> How do you handle EBCDIC? UTF8 for Ponie? > All character sets (like EBCDIC) or encodings (like UTF-8) are > "normalized" to the Unicode (character set) (and our own *internal* > encoding, the 8/16/32 one.) Ok. >> Not used *yet* - what about: >> >

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Leopold Toetsch
Dan Sugalski wrote: Done. It'll guaranteed kill half the tinderboxen--I think my first thing to do on monday is to patch up the build procedure to use the system ICU if it's available. Thanks for checkin. And yes. What about building without ICU? I can imagine that some embedded usage of Parrot

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
> Jeff Clites <[EMAIL PROTECTED]> wrote: > >>On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: I'm replying for Jeff since I've been burned by the same questions over and over again :-) > >>So internally, strings don't have an associated encoding (or chartype >>or anything) > > > How do you

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote: > On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: > So internally, strings don't have an associated encoding (or chartype > or anything) How do you handle EBCDIC? UTF8 for Ponie? >> - Where is string->language? > I removed it from the string struct beca

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi
FWIW, the change sounds all good to me. The O(1) is the most important property of a string, the 8/16/32 gives us that and space savings too, going all Unicode at the heart of it is the only sensible thing to do (anything else leads into combinatorial explosion and instant insanity), "encodings" a

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Jeff Clites
On Apr 9, 2004, at 2:00 PM, Dan Sugalski wrote: At 1:55 PM -0700 4/9/04, Adam Thomason wrote: > From: Dan Sugalski [mailto:[EMAIL PROTECTED] > Done. It'll guaranteed kill half the tinderboxen--I think my first thing to do on monday is to patch up the build procedure to use the system ICU if it

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Jeff Clites
On Apr 9, 2004, at 9:29 AM, Leopold Toetsch wrote: Dan Sugalski wrote: But... this gets us very much closer to where we want to be, and I'm figuring that we're better off applying this and working out the kinks than not. I'll leave this one to Leo to make final decision on, though. Thanks Dan

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Jeff Clites
On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: I've sent my patch in through RT--it's [perl #28405]! Phew, that's huge. I'd really like to have smaller patches that do it step by step. Yes, I know it got quite large--sorry about that, I know it makes thi

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Dan Sugalski
At 1:55 PM -0700 4/9/04, Adam Thomason wrote: > From: Dan Sugalski [mailto:[EMAIL PROTECTED] > Done. It'll guaranteed kill half the tinderboxen--I think my first thing to do on monday is to patch up the build procedure to use the system ICU if it's available. Does this mean you're resigned to r

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Adam Thomason
> -Original Message- > From: Dan Sugalski [mailto:[EMAIL PROTECTED] > Sent: Friday, April 09, 2004 1:45 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: ICU incorporation and string changes heads-up > > > At 9:59 PM +0200 4/9/04, Leopold Toe

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Dan Sugalski
At 9:59 PM +0200 4/9/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 6:29 PM +0200 4/9/04, Leopold Toetsch wrote: I'll not apply it before tomorrow, though. (If not someone else were faster :) If you want it in I can commit it now--I've a local version and big enough pip

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 6:29 PM +0200 4/9/04, Leopold Toetsch wrote: >>I'll not apply it before tomorrow, though. (If not someone else were faster :) > If you want it in I can commit it now--I've a local version and big > enough pipe for it not to be a big deal. Put it in. W

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Dan Sugalski
At 9:20 AM -0700 4/9/04, Jeff Clites wrote: On Apr 9, 2004, at 8:07 AM, Dan Sugalski wrote: At 4:19 PM +0200 4/9/04, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: I've sent my patch in through RT--it's [perl #28405]! Phew, that's huge. I'd really like to have smaller patches that

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Dan Sugalski
At 6:29 PM +0200 4/9/04, Leopold Toetsch wrote: Dan Sugalski wrote: But... this gets us very much closer to where we want to be, and I'm figuring that we're better off applying this and working out the kinks than not. I'll leave this one to Leo to make final decision on, though. Thanks Dan for

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Leopold Toetsch
Dan Sugalski wrote: But... this gets us very much closer to where we want to be, and I'm figuring that we're better off applying this and working out the kinks than not. I'll leave this one to Leo to make final decision on, though. Thanks Dan for this easter egg :) Intermediate results: - patc

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Jeff Clites
On Apr 9, 2004, at 8:07 AM, Dan Sugalski wrote: At 4:19 PM +0200 4/9/04, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: I've sent my patch in through RT--it's [perl #28405]! Phew, that's huge. I'd really like to have smaller patches that do it step by step. But anyway, the patch mu

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Dan Sugalski
At 4:19 PM +0200 4/9/04, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: I've sent my patch in through RT--it's [perl #28405]! Phew, that's huge. I'd really like to have smaller patches that do it step by step. But anyway, the patch must have been a lot of work so lets see and make t

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote: > I've sent my patch in through RT--it's [perl #28405]! Phew, that's huge. I'd really like to have smaller patches that do it step by step. But anyway, the patch must have been a lot of work so lets see and make the best out of it. Some questions: First: -

Re: ICU incorporation and string changes heads-up

2004-04-09 Thread Jeff Clites
I've sent my patch in through RT--it's [perl #28405]! JEff

Re: ICU incorporation and string changes heads-up

2004-04-08 Thread Dan Sugalski
At 8:52 AM -0700 4/8/04, Jeff Clites wrote: On Apr 7, 2004, at 10:45 AM, Dan Sugalski wrote: At 10:27 AM -0700 4/7/04, Jeff Clites wrote: It has taken me longer than I expected to carve out some time to work on finishing my ICU/string patch, but it's progressing now, and I just finished tracking

Re: ICU incorporation and string changes heads-up

2004-04-08 Thread Jeff Clites
On Apr 7, 2004, at 10:45 AM, Dan Sugalski wrote: At 10:27 AM -0700 4/7/04, Jeff Clites wrote: It has taken me longer than I expected to carve out some time to work on finishing my ICU/string patch, but it's progressing now, and I just finished tracking down some bugs of mine that the config_lib

Re: ICU incorporation and string changes heads-up

2004-04-07 Thread Dan Sugalski
At 10:27 AM -0700 4/7/04, Jeff Clites wrote: It has taken me longer than I expected to carve out some time to work on finishing my ICU/string patch, but it's progressing now, and I just finished tracking down some bugs of mine that the config_lib.pasm stuff was exercising. So I'm currently back

Re: ICU incorporation and string changes heads-up

2004-04-07 Thread Jeff Clites
It has taken me longer than I expected to carve out some time to work on finishing my ICU/string patch, but it's progressing now, and I just finished tracking down some bugs of mine that the config_lib.pasm stuff was exercising. So I'm currently back to the state of passing all expected tests (

ICU incorporation and string changes heads-up

2004-03-17 Thread Jeff Clites
I'm almost finished preparing a patch which incorporates the usage of ICU, and makes some additional changes to the internal representation of strings. These changes give us an internal representation model which is a bit simpler, and is measurably faster. (More details to follow with the actua