You may recall that some weeks ago I posted that I couldn't read a UTF-8 message in my UTF-8 terminal even though everything appears to be set up properly: $charset, locale environment variables, locale definition matching those variables, message's Content-Type: header, wide-character version of ncurses, etc. Well, I did some debugging, and the problem appears to be this:
Mutt is not treating incoming UTF-8 messages as UTF-8. It is finding and parsing the the "charset=utf-8" parameter in the Content-Type: header, but the multibyte-to-wide character conversion is just returning each individual byte as a separate character instead of converting from UTF-8 and returning the transformed Unicode characters. So it's sabotaged long before it is even trying to convert the characters into something my terminal can display. For instance, my test message is some Latin text, and one of the characters in it that falls outside of the Latin-1 range is LATIN SMALL LETTER A WITH MACRON, U+0101. This is encoded in UTF-8 as the two-byte sequence 0xC4 0x81. Instead of returning the single character U+0101, the multibyte decoder is returning two characters: U+00C4 followed by U+0081. U+00C4 is LATIN CAPITAL LETTER A WITH DIARESIS (�) which is what I see if I run mutt in a Latin-1 terminal instead of a UTF-8 one; but U+0081 is a reserved control character which fails the iswprint() test and is therefore replaced by a backslash-octal sequence. So instead of the single character it should be sending (a small 'a' with a line over it), mutt is sending the five-character sequence "�\201". (And since in UTF-8, � followed by \ is an invalid multibyte sequence, I see an "invalid character" glyph instead of the � in my UTF-8 terminal.) Any tips on where to look further to figure out why it's not recognizing the incoming message as UTF-8? -- Mark REED | CNN Internet Technology 1 CNN Center Rm SW0831G | [EMAIL PROTECTED] Atlanta, GA 30348 USA | +1 404 827 4754 -- One is not superior merely because one sees the world as odious. -- Chateaubriand (1768-1848)