On Thu, May 23, 2002 at 09:58:03PM -0700, Daniel Quinlan wrote: | dman writes: | | > It came through just fine, though I can't display it in my console. I | > just found out that gvim can't display it either with my fontset. It | > does handle UTF-8 well, though; and I double-checked the UTF-8 | > decoding myself. (read the UTF-8 RFC some time. It's really short | > and kinda cool) | | ARgh, my company's mail relay apprently changed the email. | | X-MIME-Autoconverted: from quoted-printable to 8bit by xxx.transmeta.com id |g4MNaWj18505
That's still not a problem, as long as your mailreader understands the "utf-8" flag in the Content-Type header. In fact, the bytes that you reported were the correct bytes, you just didn't decode the bytes into characters. | > He didn't send unquoted binary. I looked at the raw message myself, | > it was UTF-8 data transfered as Quoted-Printable. There's only 2 | > characters there (not counting any potential whitespace). | | That's fine except for one thing. AFAIK, Korean spam is usually sent in | ks_c_5601-1987 or euc-kr, not UTF-8. Usually, the subject is binary, | although maybe that's due to my stupid mail gateway, but I have received | a few QP and base64 Subject: headers too, so I'm not sure. | | One of the QP-encoded Subject: headers had the same pattern I sent in my | last email: b1 a4 b0 ed (inside [ and ]). | | > Once you decode the UTF-8 into Unicode, you get ad11 ace0 | > (16-bit chars, not 8-bit). | > | > For checking it in SA the various IS0 encodings of korean should be | > handled as well as UTF-8. | | I don't think I've ever received a UTF-8 Korean spam, That's why someone needs to convert the characters to ks_c_5601-1987 and euc-kr for SA's tests. | but maybe my spammers are not as smart as your spammers. ;-) I doubut it. I just junk anything that is ks_c_5601-1987 or euc-kr since I can't read it anyways (and big5 and euc-jp and any other far-east encodings I find in spam). It would be nice if I could junk other foreign-language messages too since I can't read them. That's a little harder to detect (ie when they are iso8859-1 or utf-8). -D -- The lot is cast into the lap, but its every decision is from the Lord. Proverbs 16:33 GnuPG key : http://dman.ddts.net/~dman/public_key.gpg
msg05412/pgp00000.pgp
Description: PGP signature