Hi there,

Thank you for the detailed and considered response. I have replied inline to 
your questions/comments.

On Tue, Apr 26, 2022 at 04:18:51PM -0500, Derek Martin wrote:
> On Mon, Apr 25, 2022 at 11:08:41AM +0000, Joel Buckley wrote:
> > Hi all,
> > 
> > I have been using mutt for some time on a VT510 terminal (similar to
> > https://en.wikipedia.org/wiki/VT520), and enjoying it. 
> 
> An actual serial hardware terminal?  Those are getting to be rare
> beasts indeed... ;-)

Indeed, it was very hard to track down a working unit for sale in the past year 
or so. I'm really happy with how much I've been able to integrate it into my 
daily workflow --- more information available at 
https://blog.joelbuckley.com.au/2021/07/os-x-vt220-part-1 if you're interested 
in the setup. 

> > The display does not support UTF-8, so I had
> > LC_ALL="en_US.ISO8859-1" in my ~/.profile. This worked well for
> > mutt.
> 
> So here you say the terminal doesn't support UTF-8...

Perhaps it's more accurate for me to to say that the terminal doesn't have a 
setting for UTF-8, and won't support the full range of characters that could be 
sent along the serial port to the terminal. I have the terminal set to 
ISO8859-1, and would like to restrict all terminal applications to this 
character set if possible.

> > I then discovered that by changing mutt to load with
> > LC_ALL="en_US.UTF-8" that all was well. 
> 
> Huh?  These two things seem to be contradictory...  
> 
> Also, I'm assuming this message was sent from Mutt NOT using your
> non-UTF-8-supporting terminal, since it is indeed encoded in UTF-8 and
> contains actual UTF-8 characters...

Yes, I sent the last email (and this one) from a modern machine which supports 
UTF-8, and allows me to deal with longer emails more comfortably. A 24 line 
screen has its limitations!

The idea behind me saying 'when I tell the machine to make outputs UTF-8' is 
that somewhere, in the depths of the machine before I see my output, it is 
converting characters in a better way than when I tell the machine that it 
should limit itself to ISO8859-1. The terminal then renders what the machine 
gives it over serial, and most of it works either way, but these issues with 
quotes and ticks remain.

> Anyway, getting back to the normal order of things...
> 
> > However today I received an email with the string "Don=E2=80=99t
> > know when I will be there next.". This should display as something
> > like "Don't know where I will be there next.". In my mutt terminal,
> > it displayed:
> > > "Don???t know when I will be there next".
> 
> The issue is that there are no curly quotes in iso8859-1.  Both
> Windows and Mac support a modified version of iso8859-1 that includes
> curly quotes, but unfortunately use different character codes for
> them.  These character sets have their own names, but frequently mail
> applications are misconfigured to label them iso8859-1, because
> they're mostly identical and it works most places--as long as you're
> on the same platform as the sender.

Agree, and that's why I had made this display filter script. However, and I 
think you really hit the nail on the head a little lower down, it seems that 
the content is changed even before display_filter has a chance to perform a 
search for a replacement.

> > Thinking this was odd, I dove into my filter.sh script, and
> > discovered that no end of hacking would enable me to filter out the
> > '=E2=80=99' before display --- there seemed to be some amount of
> > parsing before my filter got ahold of it. All that I could match on
> > was '???', despite being able to edit the content of the mail
> > itself, and see the string '=E2=80=99'. My filter line of
> > significance is:
> > > output=`echo "$output" | sed "s/[’‘]/$(echo "27" | xxd -p -r)/g"`
> > This replaces 'smart quotes' with their ASCII equivalents.
> 
> Given that you already have a display filter script, this isn't a
> horrible solution--assuming it actually worked.  Note that you have a
> couple of harmless bugs though: 
> 
> 1. You've doubled up your double quotes, so actually 27 is not quoted.
>    It's harmless, but you don't need this anyway:
> 2. You needlessly fork two additional processes--one for the subshell
>    for echo, another for xxd.  This can be greatly simplified to:
> 
>    echo "$output" | sed "s/[’‘]/'/g"
> 
>    Presumably you avoided this because the single quote is "special"
>    to the shell, but since in this case it is enclosed in double
>    quotes it loses its specialness.

Good pick-up, thanks. This was a recent change because I was tearing my hair 
out when the smart quotes were appearing as '???' no matter what I had in the 
script. I was concerned that I had written a non-ASCII character, and wanted to 
be absolutely sure of the ASCII code it was outputting during debugging. I will 
fix this script up once I resolve the underlying charset issue.

> > Thinking that this would be a matter of ensuring that the filter
> > script had the right character support, I added "export
> > LC_ALL="en_US.UTF-8"" to the top of my filter script, however this
> > did nothing for me.
> 
> Your filter script will run with the same locale as mutt, since it is
> a subprocess--it inherits the locale from its parent.  So if mutt were
> indeed started with LC_ALL=en_US.UTF-8 then so too will your display
> filter.  But you shouldn't need to do any of this...
> 
> > After some messing around, it seemed that the
> > only way to get mutt to support the filtering of my problematic
> > string  was to call mutt itself with the required character encoding
> > (UTF-8).
> 
> What character set is the message itself encoded with (according to
> its headers)?  If your terminal is set up right, and the charset on
> the message is correct, then Mutt should be taking care of this
> already for you by running iconv on the message.  Basically, except in
> rare cases, if your terminal is set up properly, you shouldn't ever
> need to deal with character sets explicitly.

> > Is this correct and best-practice, or have I missed something here?
> > My installation is currently working by using the 'export
> > LC_ALL="en_US.UTF-8"' line in my ~/.profile, however this feels like
> > bad practice
> 
> Because it is.
> 
> But I think you may have one of the rare cases.  I think what's
> happening is Mutt is correctly running iconv to convert your message
> from UTF-8 (which it most likely is in) to iso8859-1, which partially
> fails due to the annoying curly quotes, and then passes it to your
> filter script, which runs on that but it is already converted to '?'
> due to the character not having an equivalent in iso8859-1.
> 
> Assuming that's true, the only thing I can think of is an old trick
> that iconv supports, which I vaguely remember using in Mutt *ages*
> ago.  Try explicitly setting $charset *IN MUTT* to
> ISO-8859-1//TRANSLIT, which might or might not help.  But it's likely
> to have other negative effects...

This absolutely rings true for me. During debugging of my display_filter, it 
seemed that my sticking point was that changing the $LC_ALL shell variable 
changed the input to the display_filter script. This is ultimately what I am 
confused about and am hoping for a way around --- understanding the black box 
that sits between a) mutt opening the file from the disk and b) display_filter 
being called with some input, whose characteristics seem to change based on the 
shell value of $LC_ALL.

I have tried `set charset=ISO-8859-1//TRANSLIT`, thank you for the suggestion. 
I was entirely unaware of this option. The result removed an apostrophe 
entirely, converting a single opening smart quote to a backtick, and left a 
closing single smart quote as '???'. Very mixed results.

I think to crack this nut, it would be worthwhile to understand more the iconv 
call that you mentioned. The existence of the iconv call is news to me, and I 
wonder if there are any configurable parameters for that? For example, is there 
a pre-iconv filter script? This could be one way to solve my problems (by 
filtering out the characters/strings that iconv seems to be choking on).

Thanks for your thorough reply, Derek. It is much appreciated and I will 
continue to mull on the ideas above.

> -- 
> Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
> -=-=-=-=-
> This message is posted from an invalid address.  Replying to it will result in
> undeliverable mail due to spam prevention.  Sorry for the inconvenience.
> 

Regards,

-- 
Joel Buckley

Reply via email to