RE: [SAtalk] detecting large collections of random words

2004-01-09 Thread Smart,Dan
---Original Message- | From: Carl Chipman [mailto:[EMAIL PROTECTED] | Sent: Friday, January 09, 2004 10:32 AM | To: Smart,Dan | Subject: RE: [SAtalk] detecting large collections of random words | | Btw, why not rename your rules CP_RANDOMWORDS_10 and _15 so | that the name is more accurate descrip

RE: [SAtalk] detecting large collections of random words

2004-01-09 Thread Smart,Dan
15 2.5 ## EOF --- <> | -Original Message- | From: McWhirter,Julia [mailto:[EMAIL PROTECTED] | Sent: Friday, January 09, 2004 5:45 AM | To: Chris Petersen; [EMAIL PROTECTED] | Subject: RE: [SAtalk] detecting large collections of random words | | I have tried this and still SA doe

RE: [SAtalk] detecting large collections of random words

2004-01-09 Thread McWhirter,Julia
] Subject: RE: [SAtalk] detecting large collections of random words > Looks good. just running this over a ham mail box with about 500 messages > and a spam mail box with the same, and not decoding base64 and such, I > see the following: what about something like: /(?:\b(?!=(?:from|even|

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Chris Petersen
> Negative look-ahead is (?!...), not (?!=...). In your version, > the equals sign is part of the pattern to match, and since > anything that matches [a-z] can't be '=', the negative look- > ahead ends up doing nothing. Oops.. Shows how long it's been since I've done serious regex stuff (and I

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Keith C. Ivey
Chris Petersen <[EMAIL PROTECTED]> wrote: > what about something like: > > /(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){12}/ Negative look-ahead is (?!...), not (?!=...). In your version, the equals sign is part of the pattern to match, and since anything that matches [a-z] can't

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Chris Petersen
> Looks good. just running this over a ham mail box with about 500 messages > and a spam mail box with the same, and not decoding base64 and such, I > see the following: what about something like: /(?:\b(?!=(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){12}/ I'm trying to think of extremely comm

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Chris Petersen
> Slightly better might be: > /(?:(\b[a-z]{4,12}\s+){12,})/ The surrounding (?:) doesn't actually do anything - you're just grouping the whole regex itself. Thus /(\b[a-z]{4,12}\s+){12}/ would work just as well. or /(?:\b[a-z]{4,12}\s+){12}/ if you wanted to make a slight optimization and not g

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Pierre Thomson
Here's a rule I wrote for just this sort of spam: rawbody WORDWORD/[a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} / describe WORDWORD long string of random words score WORDWORD 2.0 (Sorry if it wraps,

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Gary Funck
> From: Chris Petersen [...] > > Yes. though I used: > > /(\b[a-z]{4,12}\s+){12}/ > > notice the initial /b, and there's no need to make SA continue to search > beyond the "minimum" match, so leave off the , in the last {} cluster. > Looks good. just running this over a ham mail box with about

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Gary Funck
> -Original Message- > From: [EMAIL PROTECTED] > Sent: Thursday, January 08, 2004 12:57 PM > > Would this regex make more sense? > > /([a-z]{4,12}\s){12,}/ Slightly better might be: /(?:(\b[a-z]{4,12}\s+){12,})/ --- This SF.net e

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Pierre Thomson
BTW, I'm already seeing some random-word spam with random punctuation thrown in as well... Pierre Thomson -Original Message- From: Chris Petersen [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 4:18 PM To: [EMAIL PROTECTED] Subject: RE: [SAtalk] detecting large collec

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Chris Petersen
> Would this regex make more sense? > /([a-z]{4,12}\s){12,}/ Yes. though I used: /(\b[a-z]{4,12}\s+){12}/ notice the initial /b, and there's no need to make SA continue to search beyond the "minimum" match, so leave off the , in the last {} cluster. -- Chris Petersen Programmer / Web Designe

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Andrew_Hoying
Would this regex make more sense? /([a-z]{4,12}\s){12,}/ Andrew Hoying [EMAIL PROTECTED] wrote on 01/08/2004 01:37:49 PM: > Here's a rule I wrote for just this sort of spam: > > rawbody WORDWORD/[a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z] > {4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Pierre Thomson
Here's a rule I wrote for just this sort of spam: rawbody WORDWORD/[a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} [a-z]{4,12} / describe WORDWORD long string of random words score WORDWORD