In message <20020102054642$[EMAIL PROTECTED]>
          "David & Lisa Jacobs" <[EMAIL PROTECTED]> wrote:

> Here is a short list of TODOs that I came up with for STRINGs.  First, do
> these look good to people?  And second, what is the preferred method for
> keeping track of these (patch to the TODO file, entries in bug tracking
> system, mailing list, etc.
> 
> * Add set ops that are encoding aware (e.g., set S0, "something", "unicode",
> "utf-8")?

You can already have Unicode constants by prefixing the string
with a U character. I seem to recall Dan saying that he didn't want
to allow constants in arbitrary encodings but instead would prefer
just to have native and unicode.

> * Add transcoding ops (this might be a specific case of the previous e.g.,
> set S0, S1, "unicode", "utf-16")

I'm not sure whether this is needed. I think the idea is that in
general transcoding will happen at I/O time, presumably by pushing
a transcoding module on the I/O stack.

> * Move like encoded string comparison into encodings (i.e., the STRING
> comparison function gets the strings into the same encoding and then calls
> out to the encodings comparison function - This will allow each encoding to
> optimize its comparison.

The problem with this is that string comparison depends on both the
encoding and the character set so in general you can't do this. If
the character set was the same for both strings then you could do so
though.

What I did think about was having a flag on each encoding that
specified whether or not comparisons for that encoding could be
done using memcmp() when the character sets were the same. That
is true for things like the single byte encoding, but probably
not for the unicode encodings due to canonicalisation issues.

> * Add size of string termination to encodings (i.e., how many 0 bytes)

Certainly.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Reply via email to