Re: Unicode thoughts...

Jeff Mon, 25 Mar 2002 19:36:39 -0800

Jeff wrote:
> 
> Hong Zhang wrote:
> >
> > I think it will be relative easy to deal with different compiler
> > and different operating system. However, ICU does contain some
> > C++ code. It will make life much harder, since current Parrot
> > only assume ANSI C (even a subset of it).
> >
> > Hong
> >
> > > This is rather concerning to me.  As I understand it, one of
> > > the goals for
> > > parrot was to be able to have a usable subset of it which is totally
> > > platform-neutral (pure ANSI C).   If we start to depend too much on
> > > another library which may not share that goal, we could have trouble
> > > with the parrot build process (which was supposed to be
> > > shipped as parrot bytecode)
> 
> I guess it's obvious that I hadn't looked at the target platforms for
> ICU as closely as I probably should have. C vs. C++ doesn't concern me,
> as it can always be rewritten, but lack of platforms like OS X does.
> Given that, I think an interim solution consisting of basic Unicode
> utilities we'll need, such as Unicode_isdigit(). This can be a simple
> wrapper around isdigit() for the moment, until I sort out which files we
> need from the Unicode database, and what support functions/data
> structures will be required.
> 
> Given that we're dedicated to either UTF-16 or UTF-32 for internal
> string representation (undecided as of yet, and isn't affected by this),
> we can get away with creating a simple unicode.{c.h} suite of functions
> that looks like:
> 
> Parrot_Int Parrot_isDigit(char* glyph);
> 
> We can get away with the simplicity here because the character array
> should already be a valid UTF-{16,32) string, and responsibility for
> making sure there's a valid glyph at that offset can be safely offloaded
> to the caller, if not higher up the calling chain. Also, it should be in
> a separate file because, assuming the final internal representation
> matches that of the RE engine, the engine can use these utilities as
> well.
> 
> Now, admittedly this is only slightly better-thought-out than the
> origina proposal, but I think it has a much better chance of being
> implemented, and in a fairly short amount of time. (He said, knowing
> full well that there's always one more problem) ASCII versions of the
> functions should be almost trivial, and can be left in there as a
> compile-time switch should we choose to do an ASCII-only or UTF-8-only
> version.
> 
> In conclusion, this approach feels more workable, and the full UTF-16
> implementation details can be rolled out incrementally, rather than a
> single mass migration. If this suggestion flies, I'll rewrite
> strings.pdd and post it in the next few days.
> --
> Jeff <[EMAIL PROTECTED]>


Okay, now I feel utterly silly, having just looked at
chartypes/unicode.c. "Well, that approach'll work. Wonder why nobody
thought...<greps for isdigit()>...uh...never mind. I'll be over here,
with the dunce cap on."
--
Jeff <[EMAIL PROTECTED]>

Re: Unicode thoughts...

Reply via email to