On Tue, 08 Jul 2014 11:22:25 +1000, Ben Finney wrote: > A group of (a particular amount of) U+0020 characters is visually > indistinguishable from a U+0009 character, when the default semantics > are applied to each.
Hmmm. I'm not sure there actually *is* such a thing as "default semantics" for tabs. If you look at a tab character in a font, it probably looks like a single space, but that depends on the font designer. But if you look at it in a text editor, it will probably look like eight spaces, unless it looks like four, or some other number, and if you look at it in a word processor, it will probably look like a "jump to the next tab stop" command. In a spreadsheet application, it will be a cell separator and consequently doesn't look like anything at all. I don't think any of those things count as "default semantics". The point being, tabs are *control characters*, like newlines and carriage returns and form feeds, not regular characters like spaces and "A" or "λ". Since "indent" is an *instruction* rather than a character, it is best handled with a control character. In any case, if we limit ourselves to text editors, only a specific number of spaces will be visually indistinguishable from a tab, where the number depends on which column you start with: x x # Tab x x # Seven spaces x x # Six spaces x x # Eight spaces Even in a proportional font, the last two should be distinguishable from the first two. Admittedly, that does leave the case where N spaces (for some 1 <= N <= 8) looks like a tab. That's a probably, but it's not the only one: * End of line is a problem. I know of *at least* the following seven conventions for end-of-line: - ASCII line feed, \n (Unix etc.) - ASCII carriage return, \r (Acorn, ZX Spectrum, Apple, etc.) - ASCII \r\n (CP/M, DOS, Windows, Symbian, Palm, etc.) - ASCII \n\r (RISC OS) - ASCII Record Separator, \x1E (QNX) - EBCDIC New Line, \N{NEXT LINE} in Unicode (IBM mainframes) - ATASCII \x9B (Atari) * Form feeds are a problem, since they are invisible, but still get used (by Vim or Emacs, I forget which) to mark sections of text. * Issues to do with word-wrapping and hyphenation, or lack thereof, are a problem. * Encoding issues are a problem. * There are other invisible characters than spaces (non-breaking space, em-space, en-space, thin space). The solution is to use a smarter editor. For example, an editor might draw a horizontal rule to show a form feed on a line of its own, or highlight unexpected carriage return characters with ^M, or display tabs in a different colour from spaces, or overlay it with a \x09 glyph. Or an editor might be smart enough to automatically do what the current paragraph or block does: if the block is already indented with tabs, pressing tab inserts a tab, but if it is indented with spaces, pressing tab inserts spaces. Isn't this why you recommend people use a programmer's editor rather than Notepad? A good editor should handle these things for you automatically, or at least with a minimum amount of manual effort. >> The former is a "control" character, which has specific semantics >> associated with it; the latter is a "printable" character, which is >> usually printed and interpreted as itself (although in this particular >> case, the printed representation is hard to see on most output >> devices). > > And those specific semantics make the display of those characters easily > confused. That is why it's generally a bad idea to use U+0009 in text > edited by humans. I disagree. Using tabs is no more a bad idea than using a formfeed, or having support for multiple encodings. >> This mailing list doesn't seem to mind that lines beginning with ASCII >> SPC characters are semantically different from lines beginning with >> ASCII LF characters, although many detractors of Python seem unduly >> fixated on it. > > The salient difference being that U+000A LINE FEED is easily visually > distinguished from a short sequence of U+0020 SPACE characters. This > avoids the confusion, and makes use of both together unproblematic. True, but that's *only* because your editor chooses to follow the convention "display a LINE FEED by starting a new line" rather than by the convention "display the (invisible or zero-width) glyph of the LINE FEED". If editors were to standardise on the convention "display a HORIZONTAL TAB character as visibly distinct from a sequence of spaces" (e.g. by shading the background a different colour, or overlying it with an arrow) then we would not be having this discussion. In other words, it is the choice of editors to be *insufficiently smart* about tabs that causes the problem. There is a vicious circle here: * editors don't handle tabs correctly * which leads to (some) people believing that "tabs are bad" and should be avoided * which leads to editors failing to handle tabs correctly, because "tabs are bad" and should be avoided. A pity really. -- Steven -- https://mail.python.org/mailman/listinfo/python-list