Re: Spam

Adam Katz Wed, 30 Mar 2011 15:19:38 -0700

On 03/30/2011 01:23 PM, RW wrote:
> A lot of these long words are rarely used in the wild - other than
> to say how long they are.
> 
> The subjects have two separate characteristics: the length and the 
> number of lower to upper case transitions. I score them separately
> and use:
> 
> header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/
> header SUBJ_ODD_CASE  Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/


(Personally, I'd prefer to limit it to letters rather than also
including numbers, underscores, and special characters.)

There's also exaggerated text like aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaarg,
hahahahahahahahahahahahahahaha, lollllllllllllllllllllll!11111one,
intentional strings like goodluckwiththat, and suffixes like
"somethingorother" (as in "Mr. Rosensomethingorother").

I think my rule was a little more efficient at accomplishing something
similar.  John's was better named and is preferable except for the fact
that it still takes a while to parse (though at least it's limited to
just one line of each message).

signature.asc
Description: OpenPGP digital signature

Re: Spam

Reply via email to