On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote: > On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson > <rantingrickjohn...@gmail.com> wrote: >> In the event that i change my mind about Unicode, and/or for >> the sake of others, who may want to know, please provide a >> list of languages that *YOU* think handle Unicode better than >> Python, starting with the best first. Thanks.
Better than Python? Easy-peasy: List of languages with Unicode handling which is better than Python = [] I'm not aware of any language with better or more complete Unicode functionality than Python's. (That doesn't necessarily mean that they don't exist.) > jmf has been asked this before, and as I recall he seems to feel that > UTF-8 should be used for all purposes, ignoring the limitations of > that encoding such as that indexing becomes a O(n) operation. Technically, UTF-8 doesn't *necessarily* imply indexing is O(n). For instance, your UTF-8 string might consist of an array of bytes containing the string, plus an array of indexes to the start of each code point. For example, the string: “abcπßЊ•𒀁” (including the quote marks) is 10 code points in length and 22 bytes as UTF-8. Grouping the (hex) bytes for each code point, we have: e2809c 61 62 63 cf80 c39f d08a e280a2 f0928081 e2809d so we could get a O(1) UTF-8 string by recording the bytes (in hex) plus the indexes (in decimal) in which each code point starts: e2809c616263cf80c39fd08ae280a2f0928081e2809d 0 3 4 5 6 8 10 12 15 19 but (assuming each index needs 2 bytes, which supports strings up to 65535 characters in length), that's actually LESS memory efficient than UTF-32: 42 bytes versus 40. > He has > pointed at Go as an example of a language wherein Unicode "just > works", although I think that others do not necessarily agree [1]. I think it is typical of JMF that his idea of a language where Unicode "just works" is one where it *does work at all* (at least not as strings). Python 1.5 strings supported Unicode just as well as Go's string class. In Go, the right way to handle Unicode is to use "runes", not strings. I don't know how well that works though -- I suspect it is still pretty primitive. > [1] https://coderwall.com/p/k7zvyg/dealing-with-unicode-in-go Nice link, thanks! -- Steven -- https://mail.python.org/mailman/listinfo/python-list