On Fri, 27 May 2016 04:10 pm, Marko Rauhamaa wrote: > Steven D'Aprano <st...@pearwood.info>: >> This concept of ASCII = "all character sets", or "nearly all", or >> "okay, maybe not nearly all of them, but just the important ones" is >> terribly Euro-centric. The very idea would be laughable in Japan and >> other East Asian countries, where Shift-JIS and Big5 still dominate. > > Shift-JIS and Big5 are ASCII derivatives:
Gosh. Really? If you looked at what I wrote, I said: "Then there are the variable-width encodings which are backwards compatible with ASCII *only* in the sense that text containing only ASCII characters uses the same sequence of bytes as ASCII would." and gave both Shift-JIS and Big5 as examples. But you cannot treat them as "like ASCII" or "extended ASCII" because they are multibyte encodings. Unlike UTF-8, if you mangle a Shift-JIS or Big5 multibyte sequence, you don't just corrupt a single character, you corrupt a potentially unlimited amount of subsequent text. I don't mind being corrected if I make a genuine mistake, in fact I appreciate correction. But being corrected for something I already acknowledged? That's just arguing for the sake of arguing. [...] > ASCII derivatives are in wide use in the Americas and Antarctica as > well. They have been spotted in Australia, New Zealand, Oceania and > Africa. You shouldn't be surprized if you run into them in Asia, either. Of course. But they're not *all encodings*, and while they're important, there are plenty of non-ASCII encodings and encodings which violate the "one byte equals one character" invariant followed by ASCII and extended-ASCII encodings. -- Steven -- https://mail.python.org/mailman/listinfo/python-list