Jeff wrote: > > Hong Zhang wrote: > > > > I think it will be relative easy to deal with different compiler > > and different operating system. However, ICU does contain some > > C++ code. It will make life much harder, since current Parrot > > only assume ANSI C (even a subset of it). > > > > Hong > > > > > This is rather concerning to me. As I understand it, one of > > > the goals for > > > parrot was to be able to have a usable subset of it which is totally > > > platform-neutral (pure ANSI C). If we start to depend too much on > > > another library which may not share that goal, we could have trouble > > > with the parrot build process (which was supposed to be > > > shipped as parrot bytecode) > > I guess it's obvious that I hadn't looked at the target platforms for > ICU as closely as I probably should have. C vs. C++ doesn't concern me, > as it can always be rewritten, but lack of platforms like OS X does. > Given that, I think an interim solution consisting of basic Unicode > utilities we'll need, such as Unicode_isdigit(). This can be a simple > wrapper around isdigit() for the moment, until I sort out which files we > need from the Unicode database, and what support functions/data > structures will be required. > > Given that we're dedicated to either UTF-16 or UTF-32 for internal > string representation (undecided as of yet, and isn't affected by this), > we can get away with creating a simple unicode.{c.h} suite of functions > that looks like: > > Parrot_Int Parrot_isDigit(char* glyph); > > We can get away with the simplicity here because the character array > should already be a valid UTF-{16,32) string, and responsibility for > making sure there's a valid glyph at that offset can be safely offloaded > to the caller, if not higher up the calling chain. Also, it should be in > a separate file because, assuming the final internal representation > matches that of the RE engine, the engine can use these utilities as > well. > > Now, admittedly this is only slightly better-thought-out than the > origina proposal, but I think it has a much better chance of being > implemented, and in a fairly short amount of time. (He said, knowing > full well that there's always one more problem) ASCII versions of the > functions should be almost trivial, and can be left in there as a > compile-time switch should we choose to do an ASCII-only or UTF-8-only > version. > > In conclusion, this approach feels more workable, and the full UTF-16 > implementation details can be rolled out incrementally, rather than a > single mass migration. If this suggestion flies, I'll rewrite > strings.pdd and post it in the next few days. > -- > Jeff <[EMAIL PROTECTED]>
Okay, now I feel utterly silly, having just looked at chartypes/unicode.c. "Well, that approach'll work. Wonder why nobody thought...<greps for isdigit()>...uh...never mind. I'll be over here, with the dunce cap on." -- Jeff <[EMAIL PROTECTED]>