You may recall that some weeks ago I posted that I couldn't read a
UTF-8 message in my UTF-8 terminal even though everything appears
to be set up properly: $charset, locale environment variables,
locale definition matching those variables, message's Content-Type:
header, wide-character version of ncurses, etc.  Well, I did some
debugging, and the problem appears to be this:

Mutt is not treating incoming UTF-8 messages as UTF-8.  It is finding
and parsing the the "charset=utf-8" parameter in the Content-Type:
header, but the multibyte-to-wide character conversion is just
returning each individual byte as a separate character instead
of converting from UTF-8 and returning the transformed Unicode
characters.  So it's sabotaged long before it is even trying to
convert the characters into something my terminal can display.

For instance, my test message is some Latin text, and one of the 
characters in it that falls outside of the Latin-1 range is
LATIN SMALL LETTER A WITH MACRON, U+0101.  This is encoded in UTF-8 as
the two-byte sequence 0xC4 0x81.  Instead of returning the single
character U+0101, the multibyte decoder is returning two characters:
U+00C4 followed by U+0081.  U+00C4 is LATIN CAPITAL LETTER A WITH DIARESIS (�)
which is what I see if I run mutt in a Latin-1 terminal instead of a
UTF-8 one; but U+0081 is a reserved control character which fails the
iswprint() test and is therefore replaced by a backslash-octal sequence.  
So instead of the single character it should be sending (a small 'a' with
a line over it), mutt is sending the five-character sequence "�\201".  (And
since in UTF-8, � followed by \ is an invalid multibyte sequence, I see an
"invalid character" glyph instead of the � in my UTF-8 terminal.)

Any tips on where to look further to figure out why it's not recognizing
the incoming message as UTF-8?

-- 
Mark REED                    | CNN Internet Technology
1 CNN Center Rm SW0831G      | [EMAIL PROTECTED]
Atlanta, GA 30348      USA   | +1 404 827 4754 
--
One is not superior merely because one sees the world as odious.
                -- Chateaubriand (1768-1848)

Reply via email to