Ok, here’s what’s going on:

1. The message is encoded in windows-1252, and contains non-breaking spaces (encoded as the byte 0xA0).

2. Your terminal is using a different character encoding (probably UTF-8), in which 0xA0 (as used) does not map to a character. (In UTF-8, 0xA0 is a “continuation character”, which is only valid as a non-first byte in a multi-byte sequence.) In the absence of a valid UTF-8 character code, the terminal gives up and displays the value of the unmappable byte in angle brackets. However, “<A0>” is only how the terminal displays the nbsp character. The .eml file itself does not contain the four-character string “<A0>”.

Dealing with non-breaking spaces can be confusing, since they are difficult to differentiate from normal spaces in many editors, and they have different character codes in the common 8-bit encodings and UTF-8. But they are usually encoded either as the single byte 0xA0 (in the 8-bit encodings) or as the two-byte sequence 0xC2A0 (in UTF-8). As others have pointed out, a quick call to `tr '\240' ' '` to translate the nbsps to normal spaces will often do the trick. If you happen to be using perl, `use feature “unicode_strings”` will make pattern-matching behave properly with nbsps (yes, even with strings from windows-1252-encoded files!) — for example, it will make `\s` match nbsps, which it normally doesn’t.

Best of luck with the scripting!

Galen

On 6 Apr 2019, at 4:27, Randy Bush wrote:

i receive an email

    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0)
     Gecko/20100101 PostboxApp/6.1.13
    MIME-Version: 1.0
    Content-Type: text/plain; charset=windows-1252; format=flowed
    Content-Transfer-Encoding: 8bit
    Content-Language: en-US

the text has funny space characters that i see if i save the text to
disk and look at it with less

    <A0>0.<A0><A0> flo....: 2.31 2018.11.03

<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 1.<A0><A0> CLIMATE ACTION <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> * (N)ew (M)odify (D)elete..: N

<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 2. * NAME OF CLOUD: cumulus

i presume the sender is thunderbird and they have created the text with
some sort of windows encoding on a mac?

how can i save the content as vanilla ascii text?

randy
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate

Reply via email to