On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > I dont know how one causally connects the 'headaches' but Ive seen - > mojibake
Mojibake is certainly more common with multiple encodings, but the solution to that is Unicode, not ASCII. In fact, in your blog post you even link to a post of mine where I explain that ASCII has gone through multiple backwards incompatible changes over the decades, which means you can have a limited form of mojibake even in pure ASCII. Between changes over various versions of ASCII, and ambiguous characters allowed by the standard, you needed some sort of out-of-band metadata to tell you whether they intended an @ or a `, a | or a ¬, a £ or a #, to mention only a few. It's only since the 1980s that ASCII, actual 7-bit US ASCII, has become an unambiguous standard. But that's okay, because that merely allowed people to create dozens of 7-bit and 8-bit variations on ASCII, all incompatible with each other, and *call them ASCII* regardless of the actual standard name. Between ambiguities in actual ASCII, and common practice to label non- ASCII as ASCII, I can categorically say that mojibake has always been possible in so-called "plain text". If you haven't noticed it, it was because you were only exchanging documents with people who happened to use the same set of characters as you. > - unicode 'number-boxes' (what are these called?) They are missing character glyphs, and they have nothing to do with Unicode. They are due to deficiencies in the text font you are using. Admittedly with Unicode's 0x10FFFF possible characters (actually more, since a single code point can have multiple glyphs) it isn't surprising that most font designers have neither the time, skill or desire to create a glyph for every single code point. But then the same applies even for more restrictive 8-bit encodings -- sometimes font designers don't even bother providing glyphs for *ASCII* characters. (E.g. they may only provide glyphs for uppercase A...Z, not lowercase.) > - Worst of all what we > *dont* see -- how many others dont see what we see? Again, this a deficiency of the font. There are very few code points in Unicode which are intended to be invisible, e.g. space, newline, zero- width joiner, control characters, etc., but they ought to be equally invisible to everyone. No printable character should ever be invisible in any decent font. > I never knew of any of this in the good ol days of ASCII You must have been happy with a very impoverished set of symbols, then. > ¶ Passive voice is often the best choice in the interests of political > correctness > > It would be a pleasant surprise if everyone sees a pilcrow at start of > line above I do. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list