Ok, here’s what’s going on:
1. The message is encoded in windows-1252, and contains non-breaking
spaces (encoded as the byte 0xA0).
2. Your terminal is using a different character encoding (probably
UTF-8), in which 0xA0 (as used) does not map to a character. (In UTF-8,
0xA0 is a “continuation character”, which is only valid as a
non-first byte in a multi-byte sequence.) In the absence of a valid
UTF-8 character code, the terminal gives up and displays the value of
the unmappable byte in angle brackets. However, “<A0>” is only how
the terminal displays the nbsp character. The .eml file itself does not
contain the four-character string “<A0>”.
Dealing with non-breaking spaces can be confusing, since they are
difficult to differentiate from normal spaces in many editors, and they
have different character codes in the common 8-bit encodings and UTF-8.
But they are usually encoded either as the single byte 0xA0 (in the
8-bit encodings) or as the two-byte sequence 0xC2A0 (in UTF-8). As
others have pointed out, a quick call to `tr '\240' ' '` to translate
the nbsps to normal spaces will often do the trick. If you happen to be
using perl, `use feature “unicode_strings”` will make
pattern-matching behave properly with nbsps (yes, even with strings from
windows-1252-encoded files!) — for example, it will make `\s` match
nbsps, which it normally doesn’t.
Best of luck with the scripting!
Galen
On 6 Apr 2019, at 4:27, Randy Bush wrote:
i receive an email
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0)
Gecko/20100101 PostboxApp/6.1.13
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
the text has funny space characters that i see if i save the text to
disk and look at it with less
<A0>0.<A0><A0> flo....: 2.31 2018.11.03
<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 1.<A0><A0> CLIMATE
ACTION
<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> * (N)ew
(M)odify (D)elete..: N
<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 2. * NAME OF CLOUD:
cumulus
i presume the sender is thunderbird and they have created the text
with
some sort of windows encoding on a mac?
how can i save the content as vanilla ascii text?
randy
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate