Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mike Gran
> From:Alex Shinn > > Keep in mind that the UTF-8 forward iterator operation has conditional > > branches.  Merely the act of advancing from one character to another > > could take one of four paths, or more if you include the possibility > > of invalid UTF-8 sequences. > > No, technically you d

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mike Gran
>   (string-upcase "Straße")        => "STRAßE"  (should > be "STRASSE") >   (string-downcase "ΧΑΟΣΣ")        => "χαοσσ"  (should > be "χαoσς") >   (string-downcase "ΧΑΟΣ Σ")      => "χαοσ σ"  (should > be "χαoς σ") Well, yes and no.  R6RS yes.  SRFI-13 no.

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mark H Weaver
Mike Gran writes: >> The reason I am still arguing this point is because I have looked >> seriously at what I would need to do to (A) fix our i18n problems and >> (B) make the code efficient.  I very much want to fix these things, >> but the pain of trying to do this with our current scheme is too

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Alex Shinn
On Wed, Mar 16, 2011 at 5:39 AM, Mike Gran wrote: >> From:Mark H Weaver >> >> Mike Gran writes: >> > We do, in a matter of speaking, have a single string representation: >> > UTF-32.  The 'narrow' encoding is UTF-32 with the initial 3 bytes >> of >> > zero removed. >> >> Despite the similarity o

Re: O(1) accessors for UTF-8 backed strings

2011-03-15 Thread Alex Shinn
On Wed, Mar 16, 2011 at 12:46 AM, Mark H Weaver wrote: > Alex Shinn wrote: >> On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver wrote: >>> I just realized that it is possible to implement O(1) accessors for >>> UTF-8 backed strings. >> >> It's possible with several approaches, but not necessarily w

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mike Gran
> The reason I am still arguing this point is because I have looked > seriously at what I would need to do to (A) fix our i18n problems and > (B) make the code efficient.  I very much want to fix these things, > but the pain of trying to do this with our current scheme is too much > for me to bear.

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mark H Weaver
Mike Gran writes: >> From:Mark H Weaver >> Despite the similarity of these two representations, they are >> sufficiently different that they cannot be handled by the same machine >> code.  That means you must either implement multiple inner loops, one >> for each combination of string parameter r

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mike Gran
> From:Mark H Weaver > > Mike Gran writes: > > We do, in a matter of speaking, have a single string representation: > > UTF-32.  The 'narrow' encoding is UTF-32 with the initial 3 bytes > of > > zero removed. > > Despite the similarity of these two representations, they are > sufficiently diff

Re: Using libunistring for string comparisons et al

2011-03-15 Thread Mark H Weaver
Mike Gran writes: > We do, in a matter of speaking, have a single string representation: > UTF-32. The 'narrow' encoding is UTF-32 with the initial 3 bytes of > zero removed. Despite the similarity of these two representations, they are sufficiently different that they cannot be handled by the s

Re: O(1) accessors for UTF-8 backed strings

2011-03-15 Thread Mark H Weaver
Alex Shinn wrote: > On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver wrote: >> I just realized that it is possible to implement O(1) accessors for >> UTF-8 backed strings. > > It's possible with several approaches, but not necessarily worth it: > > http://trac.sacrideo.us/wg/wiki/StringRepresentati