Steven D'Aprano <st...@pearwood.info> writes: > On Tue, 08 Jul 2014 11:22:25 +1000, Ben Finney wrote: > > > A group of (a particular amount of) U+0020 characters is visually > > indistinguishable from a U+0009 character, when the default semantics > > are applied to each. > > Hmmm. I'm not sure there actually *is* such a thing as "default > semantics" for tabs.
It was likely never standardised, but yes, default semantics are long established for the HT (Horizontal Tab) control code in a text stream <URL:https://en.wikipedia.org/wiki/Tab_key#Tab_characters>. The default semantics are that an HT (Horizontal Tabulation) control code is an instruction to introduce enough horizontal space such that the following character appears at the next multiple-of-8 column. These semantics assume a fixed character width, which is itself a default semantic of the display of computer text; variable-width is a deviation from the default. > If you look at a tab character in a font I'm not talking about glyphs (for a control code, there isn't much sense talking about a default glyph), I'm talking about the default semantics of how they affect display. > But if you look at it in a text editor, it will probably look like > eight spaces, unless it looks like four, or some other number, and if > you look at it in a word processor, it will probably look like a "jump > to the next tab stop" command. Right. Programs that conform to the established default semantics for an HT (U+0009) code point will shift to the next tab stop to display the following character. Tab stops themselves are, in fixed-width character layout (which is itself the historical default), spaced apart by multiples of 8 character columns. > I don't think any of those things count as "default semantics". I hope my position is clearer. > The point being, tabs are *control characters*, like newlines and > carriage returns and form feeds, not regular characters like spaces > and "A" or "λ". Since "indent" is an *instruction* rather than a > character, it is best handled with a control character. Right. And those control codes affect display of the text, and there are default semantics for those codes: what those control codes specifically mean. The HT code has the default display semantic of “display the following character at the next horizontal tab stop”. > The solution is to use a smarter editor. The recipient's choice of editor program is not within the control of the author. Furthermore, it's expecting that the recipient will deviate from the default display semantics of the text as received. The author should write the text such that the default semantics are useful, and/or avoid text where the default semantics are undesirable or unreliably implemented. In this case: If the programmer doesn't like U+0009 resulting in text aligned at multiple-of-8 tab stops, or doesn't like the fact that recipients may have tab stops set differently, then I don't care what editor the author uses; they should avoid putting U+0009 into text. That said, a smarter text editor program *can* be a solution for “I don't like the default semantics *as displayed on my computer*”. If a programmer wants to deviate from the defaults, and can convince others on a rational and non-coercive basis to go along with their non-default preferences, they all have my blessing. If they want their preferences to override the default more broadly, they need a better argument than “it just looks better to me”. > Isn't this why you recommend people use a programmer's editor rather > than Notepad? I don't see how recommending a better editor for the *author* addresses how the *recipient*'s device renders the text. so no, that's not a reason why I recommend the author use a programmer's editor. > True, but that's *only* because your editor chooses to follow the > convention "display a LINE FEED by starting a new line" rather than by > the convention "display the (invisible or zero-width) glyph of the > LINE FEED". If editors were to standardise on the convention "display > a HORIZONTAL TAB character as visibly distinct from a sequence of > spaces" (e.g. by shading the background a different colour, or > overlying it with an arrow) then we would not be having this > discussion. If things were different, they'd be different. I'm talking about default display semantics of the U+0009 code as they are. -- \ “I used to be an airline pilot. I got fired because I kept | `\ locking the keys in the plane. They caught me on an 80 foot | _o__) stepladder with a coathanger.” —Steven Wright | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list