On 6 Apr 2019, at 7:27, Randy Bush wrote:

i receive an email

    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0)
     Gecko/20100101 PostboxApp/6.1.13
    MIME-Version: 1.0
    Content-Type: text/plain; charset=windows-1252; format=flowed
    Content-Transfer-Encoding: 8bit
    Content-Language: en-US

the text has funny space characters that i see if i save the text to
disk and look at it with less

    <A0>0.<A0><A0> flo....: 2.31 2018.11.03

<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 1.<A0><A0> CLIMATE ACTION <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> * (N)ew (M)odify (D)elete..: N

<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 2. * NAME OF CLOUD: cumulus

i presume the sender is thunderbird and they have created the text with
some sort of windows encoding on a mac?

Apparently. I've seen this in mail recently as well and suspect it may be a new default in Postbox or TBird or both to use windows-1252 instead of UTF-8. Windows-1252 is a close relative of ISO 8859-1 (Latin-1) and 0xA0 is a non-breaking space in both.

how can i save the content as vanilla ascii text?

Well, technically you can't represent a non-breaking space in ASCII because there's no such character defined in ASCII so there's no way to convert text with 0xA0 characters into "vanilla ascii" text.

You can convert 0xA0 to 0x20 (ASCII space) by piping the text through:

  tr '\240' ' '

However, doing the reverse could create a mess for anything that understands the different between a regular space and a non-breaking space.

Tools which support POSIX character classes and locales should treat 0xA0 as whitespace (class '[:space:]') if LC_ALL or LC_CTYPE is a Latin-1 locale, e.g. en_US.ISO8859-1.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate

Reply via email to