On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > It is essential for people to understand that the very same issues that > plague UTF-8 plague UTF-32 as well. Using UTF in both highlights that > fact.
What a wonderful nonsense. I suppose that the same issues plague Elon Musk as plague the musk sticks in the sweets aisle in the supermarket - they do use the same letters, after all. >> If by "very many things", you mean "not very many things", I agree >> with you. In my experience, dealing with code points is "good enough", >> especially if you use Western European alphabets, and even more so if >> you're willing to do a normalization step before processing text. > > Of course, UTF-8 doesn't relieve you from Unicode problems. But it has > one big advantage: it can usually deal with non-Unicode data without any > extra considerations while Python3's strings make you have to take > elaborate measures to handle those special cases. Why, even print() must > be guarded against UnicodeEncodeError when the printed string is not in > the programmer's control. What is this "non-Unicode data" that UTF-8 can handle? Do you mean arbitrary byte sequences? Because no, it cannot; properly-formed UTF-8 sequences MUST comply with the precise requirements of the format. Can you give an example of how Python 3's print function can raise UnicodeEncodeError when given a Python 3 string? ChrisA -- https://mail.python.org/mailman/listinfo/python-list