Re: Numbered SGML entities in header addresses

Derek Martin Fri, 12 Apr 2019 08:50:31 -0700

On Fri, Apr 12, 2019 at 11:25:20AM +0200, felixs wrote:
> On Thu, Apr 11, 2019 at 07:12:57PM -0500, Derek Martin wrote:
> > On Sun, Apr 07, 2019 at 11:13:53PM +0200, felixs wrote:
> > > On Fri, Apr 05, 2019 at 11:24:26AM -0700, Ian Zimmerman wrote:
>
> Thanks. I had already posted a follow-up on my first message.
> > But clearly it won't help at all in this case.  The problematic string
> > isn't a binary representation of a unicode character. It's an HTML
> > entity, and HTML entities in recipient headers is not supported by any
> > of the RFCs


> Event though, call them HTML entities, call them something else, they
> are ASCII characters and as such they are a subset of utf-8.

Stop saying it's unicode or ASCII...  That's incorrect--or at best
misleading--on 2 counts:  

1. &#236 (or &#263) are indeed HTML entities, inserted by some
   software which clearly intended them to be interpreted as HTML
   entities.  Referring to them as Unicode is wrong, or at best
   misleading.

2. The character which was meant to be represented, 'ć' [which is
   actually &#263, not &#236--I assume the last two digits got
   transpossed somehow...) is not ASCII at all, it's extended latin-1.

Certainly the string of characters, ['&', '#', '2', '3', '6'] are all
ASCII characters, but it is plainly obvious that it is meant to
represent some other individual character, and if you're familiar with
HTML entities, the notation makes it plain enough that it is an HTML
entity.  So claiming that it's UTF-8 and consequently suggesting
setting the charset to UTF-8 is totally a red herring.

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

pgpCLlfidS2M8.pgp
Description: PGP signature

Re: Numbered SGML entities in header addresses

Reply via email to