On Fri, May 27, 2016, at 11:53, Rustom Mody wrote: > And coding systems are VERY political. > Sure what characters are put in (and not) is political > But more invisible but equally political is the collating order. > > eg No one understands what jmf's gripes are... My guess is that a Euro > costs 3 times a Dollar. > > >>> "€".encode("UTF-8") > b'\xe2\x82\xac' > >>> "$".encode("UTF-8") > b'$' > > [Its another matter that this is not the evil deed of python but of > UTF-8!]
AIUI jmf's issue is that python's string type (nothing to do with UTF-8) doesn't treat all strings equally. Strings that are only in Latin-1 (including your dollar example) have only one byte per character, whereas strings with BMP characters have two bytes per character (he also has some more difficult to understand objections to the large fixed overhead and the cached UTF-8 version [which ASCII strings don't have]) -- https://mail.python.org/mailman/listinfo/python-list