I would full agree and I think Mark puts it really well in the
message below why some of the proposals brandished here are no
longer plain text but "not-so-plain" text.
I think we are better served with a solution that provides some
form of "light" rich text, for basic emphasis in short messages.
The proper way for this would be some form of MarkDown standard
shared across vendors, and perhaps implemented in a way that users
don't necessarily need to type anything special, but that, if
exported to "true" plain text, it turns into the source format for
the "light" rich text.
This is an effort that's out of scope for Unicode to implement,
or, I should say, if the Consortium were to take it on, it would
be a separate technical standard from The Unicode Standard.
A./
PS: I really hate the creeping expansion of pseudo-encoding via
VS characters. The only worse thing is adding novel control
functions.
On 1/18/2019 7:51 AM, Mark E. Shoulson
via Unicode wrote:
On 1/16/19
6:23 AM, Victor Gaultney via Unicode wrote:
Encoding 'begin italic' and 'end italic' would introduce
difficulties when partial strings are moved, etc. But that's no
different than with current punctuation. If you select the
second half of a string that includes an end quote character you
end up with a mismatched pair, with the same problems of
interpretation as selecting the second half of a string
including an 'end italic' character. Apps have to deal with it,
and do, as in code editors.
It kinda IS different. If you paste in half a string, you get a
mismatched or unmatched paren or quote or something. A typo, but
a transient one. It looks bad where it is, but everything else is
unaffected. It's no worse than hitting an extra key by mistake.
If you paste in a "begin italic" and miss the "end italic",
though, then *all* your text from that point on is affected! (Or
maybe "all until a newline" or some other stopgap ending, but
that's just damage-control, not damage-prevention.) Suddenly,
letters and symbols five words/lines/paragraphs/pages look
different, the pagination is all altered (by far more than merely
a single extra punctuation mark, since italic fonts generally are
narrower than roman). It's a disaster.
No. This kind of statefulness really is beyond what Unicode is
designed to cope with. Bidi controls are (almost?) the sole
exception, and even they cause their share of headaches. Encoding
separate _text_ italics/bold is IMO also a disastrous idea, but
I'm not putting out reasons for that now. The only really
feasible suggestion I've heard is using a VS in some fashion.
(Maybe let it affect whole words instead of individual
characters? Makes for fewer noisy VSs, but introduces a whole
other host of limitations (how to italicize part of a word, how to
italicize non-letters...) and is also just damage-control, though
stronger.)
Apps (and font makers) can also choose how
to deal with presenting strings of text that are marked as
italic. They can choose to present visual symbols to indicate
begin/end, such as /this/. Or they can present it using the
italic variant of the font, if available.
At which point, you have invented markdown. Instead of making
Unicode declare it, just push for vendors everywhere to recognize
/such notation/ as italics (OK, I know, you want dedicated
characters for it which can't be confused for anything else.)
- Those who develop plain text apps
(social media in particular) don't have to build in a whole
markup/markdown layer into their apps
With the complexity of writing an social media app, a markup layer
is really the least of the concerns when it comes to simplifying.
- Misuse of math chars for pseudo-italic would likely disappear
- The text runs between markers remain intact, so they need no
special treatment in searching, selecting, etc.
- It finally, and conclusively, would end the decades of the
mess in HTML that surrounds <em> and <italic>.
Adding _another_ solution to something will *never* "conclusively
end" anything. On a good day, you can hope it will swamp the
others, but they'll remain at least in legacy. More likely, it
will just add one more way to be confused and another side to the
mess. (People have pointed out here about the difficulties of
distinguishing or not-distinguishing between HTML-level <i>
and putative plain-text italics. And yes, that is an issue, and
one that already exists with styling that can change case and
such. As with anything, the question is not whether there are
going to be problems, but how those problems weigh against
potential benefits. That's an open question.)
My main point in suggesting that Unicode
needs these characters is that italic has been used to indicate
specific meaning - this text is somehow special - for over 400
years, and that content should be preserved in plain text.
There is something to this: people have been *emphasizing* text in
some fashion or another for ages. There is room to call this
plain text.
~mark
|