> -----Original Message-----
> From: Greg Webster
> Sent: Wednesday, December 10, 2003 11:45 AM
>
>
> Here's what I've recently done:
> rawbody GWW_PUNCT /([a-z][:punct:]+[a-z])|( [A-Z][:punct:]+[a-z])/i
> score   GWW_PUNCT 2.0
>
> It's not perfect, but it does the job.

I think that pattern is going to catch a lot of lines that aren't what
you're looking for. As quick check, try the following:

   perl -ne 'print if /([a-z][:punct:]+[a-z])|( [A-Z][:punct:]+[a-z])/i'
$MAIL

and notice the lines that it matches. Also, the second alternative '|(
[A-Z][:punct:]+[a-z]' doesn't match anything different than the first
alternative.
Maybe you were looking for a space followed by a capital letter?  With /i
enabled, the check for [A-Z] will include [a-z] as well. Also, [:punct:]+
is likely too general and will pick up lots of stuff. Even just adding '_'
will pick up lots of programming language variable names and so on.

This pattern seemed to work pretty well:
  /([a-z][;]+[a-z].{0,20}){3}/i

Question to the group: what's the procedure for running the rules against
the
spam/ham samples to come up wiht hit frequencies?





-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to