On Sun, Apr 07, 2019 at 11:13:53PM +0200, felixs wrote: > On Fri, Apr 05, 2019 at 11:24:26AM -0700, Ian Zimmerman wrote: > > I think this is the first time I got hit by the next stage of > > browserisation: on a mailing list, a From: line that looks like > > > > From: "Foo Bariì" <foo-ba...@gmail.com> > > > where the entity refers to the character U0107 in Unicode code point
FWIW, the quoted entity is the latin-1 character 'ì', not the character 'ć'. The latter would be ć, not 236... Seems the last two digits were transposed somehow. > And if you add > > set charset="utf-8" You should simply never set charset. Ever. If you need to, it's either because your system is misconfigured (so fix that instead), or your multi-lingual text input configuration is sufficiently complicated that you already know plenty enough about it to ignore what I just said. =8^) Setting charset is vastly more likely to cause problems, because setting it almost guarantees that you don't know what you're doing, and you're Doing It Wrong™. [I've long been tempted to lobby for the removal of this variable since it causes more confusion than it solves, and recommendations I've seen on this list over the years to set it have been universally wrong (or at least completely ineffectual), no exceptions.] But clearly it won't help at all in this case. The problematic string isn't a binary representation of a unicode character. It's an HTML entity, and HTML entities in recipient headers is not supported by any of the RFCs, AFAIK (although new ones are added all the time, so it's hard to be sure)... So the fact that it's there is because some misguided web-based e-mail software thinks ignoring e-mail RFCs is cool (or more likely, just does not understand i18n). At any rate, nothing will fix this short of Mutt providing explicit support for it, which IMO it should not do, or writing a script that can convert it, to be used as a display filter. This is bound to be more trouble than it's worth... I'm guessing the least obnoxious approach would be to find a script that converts plain text into minimally formatted HTML, and then view the resulting thing in w3m or some such. But such a thing would likely escape the HTML entities it found in the text, in some fashion, since it's assuming that it's plain text... Alternatively you'd have to parse the whole file looking for HTML entities, and then convert them to the appropriate character for the locale you're using. Blech. This may be one of the rarer cases where it's actually easier to contact the sender and get them to do something more reasonable (and standards-compliant) instead of working around their brokenness. -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
pgpxTxyMA7fyt.pgp
Description: PGP signature