Hi On Mon, Dec 13, 2004 at 04:43:28PM -0800, jdow wrote: > > I've seen another variant about by Matthew Newton that makes a bunch of > > rules for both subject and body separately. I generally don't do this as > > the body rules will match the subject line, so there's really no need, > > other than as a score amplifier. I usually only make subject rules when a > > body rule isn't appropriate. He's also done separate regular and > gappy-text > > rules, but doesn't pick up on character-sub obfuscations.. It is a decent > > set however.. > > > > One good rule I've seen that Matthew Newton wrote is this one: > > > > rawbody UOLCC_WATCH_BODY /^(Do you )?[Ww]ant (a )?(cheap > > )?([Ww]ristw|[Ww])atch\?\s*$/m > > describe UOLCC_WATCH_BODY Body asks if you want a watch > > score UOLCC_WATCH_BODY 1.5 > > > > Very targeted, but effective with low risk of FPs. > > Here is the full set of his stuff I am running. So far it has hit no ham.
I've recently updated some of these to try and match a few that were slipping through. The UOLCC_WATCH_BODY has now been modified to accept "rolex" in the place of "cheap", as one like that arrived the other day. The UOLCC_HTM_HTML_URL one is slightly less picky about which characters can appear in the "proverb" line and the "name" line, just looking for more than 8 "words" and less than 15 "words". I figured out that it's more the repeated URLs that will be unique to the spam, rather than the formatting of the two text lines. Oh, and the URL can now contain 0-9 and -, too. Didn't realise that the body test checks the subject, too, but I don't suppose it can hurt with both tests. Current set below. Matthew --------------------------------------------------------------------- header UOLCC_ROLEX_SUB1 Subject =~ /\brolex\b/i describe UOLCC_ROLEX_SUB1 Subject contains the word 'rolex' score UOLCC_ROLEX_SUB1 0.5 header UOLCC_ROLEX_SUB2 Subject =~ /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i describe UOLCC_ROLEX_SUB2 Subject contains a gappy version of 'rolex' score UOLCC_ROLEX_SUB2 1.5 body UOLCC_ROLEX_BODY1 /\brolex\b/i describe UOLCC_ROLEX_BODY1 Body contains the word 'rolex' score UOLCC_ROLEX_BODY1 0.5 body UOLCC_ROLEX_BODY2 /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i describe UOLCC_ROLEX_BODY2 Body contains a gappy version of 'rolex' score UOLCC_ROLEX_BODY2 1.5 rawbody UOLCC_WATCH_BODY /^(Do\syou\s)?[Ww]ant\s(a\s)?(rolex\s|cheap\s)?[Ww](ristw)?atch\?\s*$/m describe UOLCC_WATCH_BODY Body asks if you want a watch score UOLCC_WATCH_BODY 2 full UOLCC_HTM_HTML_URL /\n(http:\/\/[a-z0-9-]+\.[a-z]{3,4}\/[0-9a-f]{5,35}\/[[:alnum:]]{5,20}=?\.htm)\s*\n\s*\n\s*([^\s]+)(\s+[^\s]+){6,}\n\s*\n[^\s,.]+(\s[^\s,.]+){0,15}\n\s*\n\1l/s describe UOLCC_HTM_HTML_URL Matches pattern of spam mail (.htm .html) score UOLCC_HTM_HTML_URL 3.5 full UOLCC_BBONE /\n[bB1 ]{8,20}\n[bB1 ]{8,20}\n/s describe UOLCC_BBONE Contains two code lines with b, B and 1 score UOLCC_BBONE 2 body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s describe UOLCC_CAPWORD_TEST String of words that all begin with caps letter score UOLCC_CAPWORD_TEST 1.2 --------------------------------------------------------------------- -- Matthew Newton <[EMAIL PROTECTED]> UNIX Systems Administrator, Network Support Section, Computer Centre, University of Leicester, Leicester LE1 7RH, United Kingdom