Re: decoding UTF-8

Jon LaBadie Tue, 15 Mar 2016 15:09:07 -0700

On Tue, Mar 15, 2016 at 11:18:47AM +0100, Ionel Mugurel Ciobîcă wrote:
> On 14-03-2016, at 17h 30'55", Jon LaBadie wrote about "decoding UTF-8"
> > I frequently find headers (mostly Subject, but also From/To)
> > that I assume are some representation form for a UTF-8 encoded 
> > string as they start with "=?UTF-8?" and end with "=?= ".
> > For example:
> > 
> >   To: =?UTF-8?B?Z3VuZGk=?= <user@domain>
> > 
> > Is my assumption correct?  What is the representation called?
> > Is there a tool to regain the original string?
> > I believe my video system can display the larger
> > character set.
> > 
> 
> If after =?UTF-8 there is ?Q then the non-ascii characters (and =) are
> represented by their hexadecimal representation, for example ç is
> =C3=A7.
> 
> If after =?UTF-8 there is ?B then all characters are encoded using an
> algorithm that takes 6bits at the time. You can encode and decode this
> with base64:
> 
> # echo "something" | base64
> # c29tZXRoaW5nCg==
> 
> #echo "Z3VuZGk=" | base64 -d
> gundi
> 
> 
> Ionel
>


Thank you Ionel.

I looked at over 400 messages in my spam quarantine directory where
I see a lot of these encodings.  The vast majority had ?B? after
the UTF-8.  I tried a few of them and they did decode with base64.

Instead of ?B?, seven had ?Q? and two had ?q?.  These did not
decode with base64.  Gee, maybe they are ROT13 :-)

jl
-- 
Jon H. LaBadie                 j...@jgcomp.com
 11226 South Shore Rd.          (703) 787-0688 (H)
 Reston, VA  20190              (703) 935-6720 (C)

Re: decoding UTF-8

Reply via email to