Re: Numbered SGML entities in header addresses

felixs Fri, 12 Apr 2019 02:26:37 -0700

On Thu, Apr 11, 2019 at 07:12:57PM -0500, Derek Martin wrote:
> On Sun, Apr 07, 2019 at 11:13:53PM +0200, felixs wrote:
> > On Fri, Apr 05, 2019 at 11:24:26AM -0700, Ian Zimmerman wrote:
> > > I think this is the first time I got hit by the next stage of
> > > browserisation: on a mailing list, a From: line that looks like
> > > 
> > > From: "Foo Bari&#236;" <foo-ba...@gmail.com>
> >
> > > where the entity refers to the character U0107 in Unicode code point
> 
> FWIW, the quoted entity is the latin-1 character 'ì', not the
> character 'ć'.  The latter would be &#263, not
> 236...  Seems the last two digits were transposed somehow.
> 
> > And if you add
> > 
> > set charset="utf-8"
> 
> You should simply never set charset.  Ever.  If you need to, it's
> either because your system is misconfigured (so fix that instead), or
> your multi-lingual text input configuration is sufficiently
> complicated that you already know plenty enough about it to ignore
> what I just said.  =8^)  Setting charset is vastly more likely to
> cause problems, because setting it almost guarantees that you don't
> know what you're doing, and you're Doing It Wrong™.
(...)


Thanks. I had already posted a follow-up on my first message.
> But clearly it won't help at all in this case.  The problematic string
> isn't a binary representation of a unicode character. It's an HTML
> entity, and HTML entities in recipient headers is not supported by any
> of the RFCs, AFAIK (although new ones are added all the time, so it's
> hard to be sure)...  So the fact that it's there is because some
> misguided web-based e-mail software thinks ignoring e-mail RFCs is
> cool (or more likely, just does not understand i18n).

Event though, call them HTML entities, call them something else, they
are ASCII characters and as such they are a subset of utf-8. That is the
very reason why they are displayed by mutt as they are displayed. Who
said that they are binary representations? I talked about hexadecimal
representation being converted into integer, to make use of chr() 
in my python function example. Maybe I cannot follow now...
> 
> At any rate, nothing will fix this short of Mutt providing explicit
> support for it, which IMO it should not do, or writing a script that
> can convert it, to be used as a display filter.  This is bound to be
> more trouble than it's worth...  I'm guessing the least obnoxious
> approach would be to find a script that converts plain text into
> minimally formatted HTML, and then view the resulting thing in w3m or
> some such. But such a thing would likely escape the HTML entities it
> found in the text, in some fashion,  since it's assuming that it's
> plain text...  Alternatively you'd have to parse the whole file
> looking for HTML entities, and then convert them to the appropriate
> character for the locale you're using.  Blech.
Sure, a waste of time.
(...)

Cheers,

felixs

Re: Numbered SGML entities in header addresses

Reply via email to