On Fri, Mar 6, 2015, at 09:11, Chris Angelico wrote: > To prevent people from putting three paragraphs of lipsum in and > calling it a username.
Limiting by UTF-8 bytes or UTF-16 units works just as well for that. > So you truncate to the desired length, then if the first character of > the trimmed-off section is a combining mark (based on its Unicode > character types), you keep trimming until you've removed a character > which isn't. Then, if you no longer have any content whatsoever, > reject the name. Simple. My entire point was that UTF-32 doesn't save you from that, so it cannot be called a deficiency of UTF-16. My point is there are very few problems to which "count of Unicode code points" is the only right answer - that UTF-32 is good enough for but that are meaningfully impacted by a naive usage of UTF-16, to the point where UTF-16 is something you have to be "safe" from. -- https://mail.python.org/mailman/listinfo/python-list