Steven D'Aprano <st...@pearwood.info> writes:

> On Tue, 08 Jul 2014 11:22:25 +1000, Ben Finney wrote:
>
> > A group of (a particular amount of) U+0020 characters is visually
> > indistinguishable from a U+0009 character, when the default semantics
> > are applied to each.
>
> Hmmm. I'm not sure there actually *is* such a thing as "default 
> semantics" for tabs.

It was likely never standardised, but yes, default semantics are long
established for the HT (Horizontal Tab) control code in a text stream
<URL:https://en.wikipedia.org/wiki/Tab_key#Tab_characters>.

The default semantics are that an HT (Horizontal Tabulation) control
code is an instruction to introduce enough horizontal space such that
the following character appears at the next multiple-of-8 column. These
semantics assume a fixed character width, which is itself a default
semantic of the display of computer text; variable-width is a deviation
from the default.

> If you look at a tab character in a font

I'm not talking about glyphs (for a control code, there isn't much sense
talking about a default glyph), I'm talking about the default semantics
of how they affect display.

> But if you look at it in a text editor, it will probably look like
> eight spaces, unless it looks like four, or some other number, and if
> you look at it in a word processor, it will probably look like a "jump
> to the next tab stop" command.

Right. Programs that conform to the established default semantics for an
HT (U+0009) code point will shift to the next tab stop to display the
following character. Tab stops themselves are, in fixed-width character
layout (which is itself the historical default), spaced apart by
multiples of 8 character columns.

> I don't think any of those things count as "default semantics".

I hope my position is clearer.

> The point being, tabs are *control characters*, like newlines and
> carriage returns and form feeds, not regular characters like spaces
> and "A" or "λ". Since "indent" is an *instruction* rather than a
> character, it is best handled with a control character.

Right. And those control codes affect display of the text, and there are
default semantics for those codes: what those control codes specifically
mean. The HT code has the default display semantic of “display the
following character at the next horizontal tab stop”.

> The solution is to use a smarter editor.

The recipient's choice of editor program is not within the control of
the author. Furthermore, it's expecting that the recipient will deviate
from the default display semantics of the text as received.

The author should write the text such that the default semantics are
useful, and/or avoid text where the default semantics are undesirable or
unreliably implemented.

In this case: If the programmer doesn't like U+0009 resulting in text
aligned at multiple-of-8 tab stops, or doesn't like the fact that
recipients may have tab stops set differently, then I don't care what
editor the author uses; they should avoid putting U+0009 into text.

That said, a smarter text editor program *can* be a solution for “I
don't like the default semantics *as displayed on my computer*”.

If a programmer wants to deviate from the defaults, and can convince
others on a rational and non-coercive basis to go along with their
non-default preferences, they all have my blessing.

If they want their preferences to override the default more broadly,
they need a better argument than “it just looks better to me”.

> Isn't this why you recommend people use a programmer's editor rather
> than Notepad?

I don't see how recommending a better editor for the *author* addresses
how the *recipient*'s device renders the text. so no, that's not a
reason why I recommend the author use a programmer's editor.

> True, but that's *only* because your editor chooses to follow the
> convention "display a LINE FEED by starting a new line" rather than by
> the convention "display the (invisible or zero-width) glyph of the
> LINE FEED". If editors were to standardise on the convention "display
> a HORIZONTAL TAB character as visibly distinct from a sequence of
> spaces" (e.g. by shading the background a different colour, or
> overlying it with an arrow) then we would not be having this
> discussion.

If things were different, they'd be different. I'm talking about default
display semantics of the U+0009 code as they are.

-- 
 \          “I used to be an airline pilot. I got fired because I kept |
  `\       locking the keys in the plane. They caught me on an 80 foot |
_o__)                    stepladder with a coathanger.” —Steven Wright |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to