On Tue, Jul 17, 2018 at 5:51 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: >> Under that standard definition, UTF-8 and UTF-16 are variable-width, >> and UTF-32 is fixed-width. >> >> But I'll accept that UTF-32 is variable-width if Marko accepts that >> ASCII is too. > > If that makes you happy, fine. The point is, UTF-32 has no advantages > over UTF-8. And I'm referring to the text abstraction as seen by the > programmer. It has nothing to do with the layout of bytes inside > CPython. > > I use UTF-8 in my C programs and sense no disadvantage. I have never > felt a need for wchar_t. Similarly, I had a small Python2 program that > quizzed me about Hebrew vocabulary with Finnish translations and > Esperanto pronunciation instructions. All UTF-8. No unicode strings. (I > *have* converted that to Python3 just to be on the bleeding edge, but it > didn't give me any advantages over Python2.)
Challenge: Reverse a string in UTF-8. Challenge: Center text in UTF-8. Challenge: Given a (non-initial) character in a buffer of UTF-8 bytes, find the immediately preceding character. All of these are fundamentally difficult by nature, but if you index by code points, you eliminate one level of difficulty; indexing by bytes retains all the existing difficulty and adds another layer. ChrisA -- https://mail.python.org/mailman/listinfo/python-list