On Fri, 27 May 2016 07:12 am, Marko Rauhamaa wrote: > However, I must correct myself slightly: ASCII refers to any > byte-oriented character encoding scheme *largely coinciding with ASCII > proper*. But since all of them *are* derivatives of ASCII proper, > mentioning is somewhat redundant.
"All" of them? Here is a small selection of codecs provided by Python: py> codecs = "cp037 cp273 cp500 cp875 cp1026 cp1140 utf_16be".split() py> for cd in codecs: ... print("ab.12".encode(cd)) # ASCII gives b'ab.12' ... b'\x81\x82K\xf1\xf2' b'\x81\x82K\xf1\xf2' b'\x81\x82K\xf1\xf2' b'\x81\x82K\xf1\xf2' b'\x81\x82K\xf1\xf2' b'\x81\x82K\xf1\xf2' b'\x00a\x00b\x00.\x001\x002' There's also at least one other double-byte character set which, as far as I can tell, isn't supported by Python: KS X 1001, used in Korea. Then there are the variable-width encodings which are backwards compatible with ASCII only in the sense that text containing *only* ASCII characters uses the same sequence of bytes as ASCII would. But being variable-width, they cannot be treated as a simple array of bytes with a fixed 1 byte = 1 character mapping. Examples include UTF-8, UTF-7, the various Shift-JIS encodings, EUC-JP, EUC-KR, EUC-TW, GB18030, Big5, and others. This concept of ASCII = "all character sets", or "nearly all", or "okay, maybe not nearly all of them, but just the important ones" is terribly Euro-centric. The very idea would be laughable in Japan and other East Asian countries, where Shift-JIS and Big5 still dominate. So please, open your mind to the reality of computing outside of Europe. ASCII-based encodings no more encompasses all of the world's natural languages (not even the "important" ones) than "everyone is using Internet Explorer and Windows XP, right?" describes the state of the Internet. -- Steven -- https://mail.python.org/mailman/listinfo/python-list