I only respond here, as unicode in general is an important concept that the OP will to make sure his students understand in Python, and I don't want you to dishonestly sow the seeds of uncertainty and doubt.
On 11/25/2013 03:12 AM, wxjmfa...@gmail.com wrote: > Your paragraph is mixing different concepts. On the contrary, it appears you are the one mixing the concepts, and confusing a byte-encoding scheme with unicode. In an ideal world, the programmer should not need to know or care about what encoding scheme the language is using internally to store strings. And it does not matter whether the internal encoding scheme is endorsed by the unicode commission or not, provided it can handle all the valid unicode constructs. A string is unicode. Period. Hence you must concern yourself with encoding only when reading or writing a byte stream. Inside the language itself, the encoding is irrelevant. Ideally. In python 3.3+ anyway. Of course reality is different in other languages which is why programmers are used to worrying about things like exposing surrogate pairs (as Javascript does), or having to tweak your algorithms to deal with the fact that UTF-8 indexing is not O(1). To claim that a programmer has to concern himself with internal language encoding in Python 3 is not only untrue, it's ingenuousness at best, given the OP's mission. > When it comes to save memory, utf-8 is the choice. It > beats largely the FSR on the side of memory and on > the side of performances. So you would condemn everyone to use an O(n) encoding for a string when FSR offers full unicode compliance that optimizes both speed and memory? No, D'Aprano is correct. Python 3.3+ indeed does unicode right. It offers O(1) slicing, is memory efficient, and never exposes things like surrogate pairs. > How and why? I suggest, you have a deeper understanding > of unicode. Indeed I'd say D'Aprano does have a deeper understanding of unicode. > May I recall, it is one of the coding scheme endorsed > by "Unicode.org" and it is intensively used. This is not > by chance. Yes, you keep saying this. Have you encountered a real-world situation where you are impacted by Python's FSR? You keep posting silly benchmarks that prove nothing, and continue arguing, yet presumably you are still using Python. Why haven't you switched to Google Go or another language that implements unicode strings in UTF-8? -- https://mail.python.org/mailman/listinfo/python-list