Re: [SAtalk] A pointer for nailing Korean based spam

2002-06-06 Thread Derrick 'dman' Hudson
On Sat, Jun 01, 2002 at 10:45:58AM -0700, Craig R Hughes wrote: | dman wrote: | | d> Also keep a way to test the raw text itself, since some junk is easily | d> identified by the raw format and not so obvious after it has been | d> prepared for display. | | Well, we can have the charset translat

Re: [SAtalk] A pointer for nailing Korean based spam

2002-06-01 Thread Craig R Hughes
dman wrote: d> Also keep a way to test the raw text itself, since some junk is easily d> identified by the raw format and not so obvious after it has been d> prepared for display. Well, we can have the charset translation stuff happen between rawbody and body -- so rawbody will have the original

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-30 Thread dman
On Thu, May 30, 2002 at 09:46:38AM +0100, Matt Sergeant wrote: | dman wrote: | >On Wed, May 29, 2002 at 10:47:50AM +0100, Matt Sergeant wrote: | > | >| What headers might you want to not decode? | > | >The Subject: header so that weird charsets can be matched. Maybe | >others too (for similar rea

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-30 Thread Matt Sergeant
dman wrote: > On Wed, May 29, 2002 at 10:47:50AM +0100, Matt Sergeant wrote: > > | What headers might you want to not decode? > > The Subject: header so that weird charsets can be matched. Maybe > others too (for similar reasons). It may also be helpful to not > decode some headers if there is

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-29 Thread dman
On Wed, May 29, 2002 at 10:47:50AM +0100, Matt Sergeant wrote: | What headers might you want to not decode? The Subject: header so that weird charsets can be matched. Maybe others too (for similar reasons). It may also be helpful to not decode some headers if there is some deformation of them

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-29 Thread Matt Sergeant
Daniel Quinlan wrote: > Matt Sergeant <[EMAIL PROTECTED]> writes: > > >>If Craig would work on the email parser I posted to the dev list instead >>of MIME-tools, it decodes all character sets (even embedded ones in >>headers) to UTF-8, making detecting alternate character set stuff >>infinite

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-28 Thread Tony L. Svanstrom
On Tue, 28 May 2002 the voices made dman write: > Also keep a way to test the raw text itself, since some junk is easily > identified by the raw format and not so obvious after it has been > prepared for display. Search the Net for dmmh; it'll decode your QP- and base64-headers, and give the ol

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-28 Thread Daniel Quinlan
I write: >> I don't think I've ever received a UTF-8 Korean spam, dman <[EMAIL PROTECTED]> writes: > That's why someone needs to convert the characters to ks_c_5601-1987 > and euc-kr for SA's tests. Most of the spam is coming through as 8-bit ks_c_5601-1987. That's what the test should look

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-28 Thread Daniel Quinlan
Matt Sergeant <[EMAIL PROTECTED]> writes: > If Craig would work on the email parser I posted to the dev list instead > of MIME-tools, it decodes all character sets (even embedded ones in > headers) to UTF-8, making detecting alternate character set stuff > infinitely easier. Which is the bett

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-24 Thread dman
On Thu, May 23, 2002 at 09:58:03PM -0700, Daniel Quinlan wrote: | dman writes: | | > It came through just fine, though I can't display it in my console. I | > just found out that gvim can't display it either with my fontset. It | > does handle UTF-8 well, though; and I double-checked the UTF-8

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-24 Thread Matt Sergeant
dman wrote: > On Thu, May 23, 2002 at 06:23:40PM -0700, Daniel Quinlan wrote: > | Jason Baker <[EMAIL PROTECTED]> writes: > > | > denoting it. I don't read/speak Korean, so I have no idea what > | > exactly it is, but the characters are: ê´'ê³ > | > > | > (hope that comes through) > > It came

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-23 Thread Daniel Quinlan
dman writes: > It came through just fine, though I can't display it in my console. I > just found out that gvim can't display it either with my fontset. It > does handle UTF-8 well, though; and I double-checked the UTF-8 > decoding myself. (read the UTF-8 RFC some time. It's really short > an

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-23 Thread dman
On Thu, May 23, 2002 at 06:23:40PM -0700, Daniel Quinlan wrote: | Jason Baker <[EMAIL PROTECTED]> writes: | > denoting it. I don't read/speak Korean, so I have no idea what | > exactly it is, but the characters are: 광고 | > | > (hope that comes through) It came through just fine, though I can't

Re: [SAtalk] A pointer for nailing Korean based spam

2002-05-23 Thread Daniel Quinlan
Jason Baker <[EMAIL PROTECTED]> writes: > My company is both in Korea and in Canada, so we tend to get a lot of > collateral spam from Korean spamhouses AND legitimate mail. > > One point I haven't seen yet in the ruleset is that there's a law in > Korea that UCE (or perhaps even UBE) must have

[SAtalk] A pointer for nailing Korean based spam

2002-05-22 Thread Jason Baker
My company is both in Korea and in Canada, so we tend to get a lot of collateral spam from Korean spamhouses AND legitimate mail. One point I haven't seen yet in the ruleset is that there's a law in Korea that UCE (or perhaps even UBE) must have a subject header denoting it. I don't read/spea