On Thu, Mar 28, 2013 at 8:03 PM, jmfauth <wxjmfa...@gmail.com> wrote: > Example of a good Unicode understanding. > If you wish 1) to preserve memory, 2) to cover the whole range > of Unicode, 3) to keep maximum performance while preserving the > good work Unicode.org as done (normalization, sorting), there > is only one solution: utf-8. For this you have to understand, > what is really a "unicode transformation format".
You really REALLY need to sort out in your head the difference between correctness and performance. I still haven't seen one single piece of evidence from you that Python 3.3 fails on any point of Unicode correctness. Covering the whole range of Unicode has never been a problem. In terms of memory usage and performance, though, there's one obvious solution. Fork CPython 3.3 (or the current branch head[1]), change the internal representation of a string to be UTF-8 (by the way, that's the official spelling), and run the string benchmarks. Then post your code and benchmark figures so other people can replicate your results. > Python has certainly and definitvely not "revolutionize" > Unicode. This is one place where you're actually correct, though, because PEP 393 isn't the first instance of this kind of format - Pike's had it for years. Funny though, I don't think that was your point :) [1] Apologies if my terminology is wrong, I'm a git user and did one quick Google search to see if hg uses the same term. ChrisA -- http://mail.python.org/mailman/listinfo/python-list