On Tue, 19 Nov 2013 10:25:00 +1100, Chris Angelico wrote: > But the problem is also with strings coming back from JS.
Just because you call it a "string" in Ceylon, doesn't mean you have to use the native Javascript string type unchanged. Since the Ceylon compiler controls what Javascript operations get called (the user never writes any Javascript directly), the compiler can tell which operations potentially add surrogates. Since strings are immutable in Ceylon, a slice of a BMP-only string is also BMP-only; concatenating two BMP-only strings gives a BMP-only string. I expect that uppercasing or lowercasing such strings will also keep the same invariant, but if not, well, you already have to walk the string to convert it, walking it again should be no more expensive. The point is not that my off-the-top-of-my-head pseudo-implementation was optimal in all details, but that *text strings* should be decent data structures with smarts, not dumb arrays of variable-width characters. If that means avoiding dumb-array-of-char naive implementations, and writing your own, that's part of the compiler writers job. Python strings can include null bytes, unlike C, even when built on top of C. They know their length, unlike C, even when built on top of C. Just because the native Java and Javascript string types doesn't do these things, doesn't mean that they can't be done in Javascript. > - as opposed to simply saying "string > indexing can be slow on large strings", which puts the cost against a > visible line of code. For all we know, Ceylon already does something like this, but merely doesn't advertise the fact that while it *can* be slow, it can *also* be fast. It's an implementation detail, perhaps, much like string concatenation in Python officially requires building a new string, but in CPython sometimes it can append to the original string. Still, given that Pike and Python have already solved this problem, and have O(1) string indexing operations and length for any Unicode string, SMP and BMP, it is a major disappointment that Ceylon doesn't. -- Steven -- https://mail.python.org/mailman/listinfo/python-list