There is something deliciously simple, elegant... and kinda... rebellious? about doing this.  And it wouldn't even be in purview of Unicode.  "Yep, my HTML-renderer treats characters E0020..E007F just exactly the same 0020..007F, 'cept that it won't render 'em."  And you can send HTML text that looks for all the world like plain text to any normal Unicode-conformant viewer.  Now, the security issues of being able to write "invisible" JavaScript, or rather, Yet Another way you need to look at and reveal possible code, are a headache for someone else.  Viewed like this, you might do better taking this suggestion to W3C and having them amend the HTML/XML specs so that E0020..E007F are non-rendering synonyms for 0020..007F.  It wouldn't be a Unicode thing anymore, just changing the definition of HTML.  (I'm not saying it would be a GOOD idea, mind you.)

~mark

On 1/22/19 10:43 PM, James Kass via Unicode wrote:

Nobody has really addressed Andrew West's suggestion about using the tag characters.

It seems conformant, unobtrusive, requiring no official sanction, and could be supported by third-partiers in the absence of corporate interest if deemed desirable.

One argument against it might be:  Whoa, that's just HTML.  Why not just use HTML?  SMH

One argument for it might be:  Whoa, that's just HTML!  Most everybody already knows about HTML, so a simple subset of HTML would be recognizable.

After revisiting the concept, it does seem elegant and workable. It would provide support for elements of writing in plain-text for anyone desiring it, enabling essential (or frivolous) preservation of editorial/authorial intentions in plain-text.

Am I missing something?  (Please be kind if replying.)

On 2019-01-20 10:35 AM, Andrew West wrote:

A possibility that I don't think has been mentioned so far would be to
use the existing tag characters (E0020..E007F). These are no longer
deprecated, and as they are used in emoji flag tag sequences, software
already needs to support them, and they should just be ignored by
software that does not support them. The advantages are that no new
characters need to be encoded, and they are flexible so that tag
sequences for start/end of italic, bold, fraktur, double-struck,
script, sans-serif styles could be defined. For example start and end
of italic styling could be defined as the tag sequences <i> and </i>
(E003C E0069 E003E and E003C E002F E0069 E003E).

Andrew


Reply via email to