On Thu, May 23, 2002 at 09:58:03PM -0700, Daniel Quinlan wrote:
| dman writes:
| 
| > It came through just fine, though I can't display it in my console.  I
| > just found out that gvim can't display it either with my fontset.  It
| > does handle UTF-8 well, though; and I double-checked the UTF-8
| > decoding myself.  (read the UTF-8 RFC some time.  It's really short
| > and kinda cool)
| 
| ARgh, my company's mail relay apprently changed the email.
| 
|   X-MIME-Autoconverted: from quoted-printable to 8bit by xxx.transmeta.com id 
|g4MNaWj18505

That's still not a problem, as long as your mailreader understands the
"utf-8" flag in the Content-Type header.  In fact, the bytes that you
reported were the correct bytes, you just didn't decode the bytes into
characters.

| > He didn't send unquoted binary.  I looked at the raw message myself,
| > it was UTF-8 data transfered as Quoted-Printable.  There's only 2
| > characters there (not counting any potential whitespace).
| 
| That's fine except for one thing.  AFAIK, Korean spam is usually sent in
| ks_c_5601-1987 or euc-kr, not UTF-8.  Usually, the subject is binary,
| although maybe that's due to my stupid mail gateway, but I have received
| a few QP and base64 Subject: headers too, so I'm not sure.
| 
| One of the QP-encoded Subject: headers had the same pattern I sent in my
| last email: b1 a4 b0 ed (inside [ and ]).
|  
| > Once you decode the UTF-8 into Unicode, you get  ad11 ace0
| > (16-bit chars, not 8-bit).
| > 
| > For checking it in SA the various IS0 encodings of korean should be
| > handled as well as UTF-8.
| 
| I don't think I've ever received a UTF-8 Korean spam, 

That's why someone needs to convert the characters to ks_c_5601-1987
and euc-kr for SA's tests.

| but maybe my spammers are not as smart as your spammers.  ;-)

I doubut it.  I just junk anything that is ks_c_5601-1987 or euc-kr
since I can't read it anyways (and big5 and euc-jp and any other
far-east encodings I find in spam).  It would be nice if I could junk
other foreign-language messages too since I can't read them.  That's a
little harder to detect (ie when they are iso8859-1 or utf-8).

-D

-- 

The lot is cast into the lap,
but its every decision is from the Lord.
        Proverbs 16:33
 
GnuPG key : http://dman.ddts.net/~dman/public_key.gpg

Attachment: msg05412/pgp00000.pgp
Description: PGP signature

Reply via email to