There is something deliciously simple, elegant... and kinda...
rebellious? about doing this. And it wouldn't even be in purview of
Unicode. "Yep, my HTML-renderer treats characters E0020..E007F just
exactly the same 0020..007F, 'cept that it won't render 'em." And you
can send HTML text that looks for all the world like plain text to any
normal Unicode-conformant viewer. Now, the security issues of being
able to write "invisible" JavaScript, or rather, Yet Another way you
need to look at and reveal possible code, are a headache for someone
else. Viewed like this, you might do better taking this suggestion to
W3C and having them amend the HTML/XML specs so that E0020..E007F are
non-rendering synonyms for 0020..007F. It wouldn't be a Unicode thing
anymore, just changing the definition of HTML. (I'm not saying it would
be a GOOD idea, mind you.)
~mark
On 1/22/19 10:43 PM, James Kass via Unicode wrote:
Nobody has really addressed Andrew West's suggestion about using the
tag characters.
It seems conformant, unobtrusive, requiring no official sanction, and
could be supported by third-partiers in the absence of corporate
interest if deemed desirable.
One argument against it might be: Whoa, that's just HTML. Why not
just use HTML? SMH
One argument for it might be: Whoa, that's just HTML! Most everybody
already knows about HTML, so a simple subset of HTML would be
recognizable.
After revisiting the concept, it does seem elegant and workable. It
would provide support for elements of writing in plain-text for anyone
desiring it, enabling essential (or frivolous) preservation of
editorial/authorial intentions in plain-text.
Am I missing something? (Please be kind if replying.)
On 2019-01-20 10:35 AM, Andrew West wrote:
A possibility that I don't think has been mentioned so far would be to
use the existing tag characters (E0020..E007F). These are no longer
deprecated, and as they are used in emoji flag tag sequences, software
already needs to support them, and they should just be ignored by
software that does not support them. The advantages are that no new
characters need to be encoded, and they are flexible so that tag
sequences for start/end of italic, bold, fraktur, double-struck,
script, sans-serif styles could be defined. For example start and end
of italic styling could be defined as the tag sequences <i> and </i>
(E003C E0069 E003E and E003C E002F E0069 E003E).
Andrew