Hi, Would someone with a decent size corpus please be kind enough to check the following rules for me?
I think these are all new ones since last time I asked. I'm interested in the top five, mainly. The entire rule set is at http://www.le.ac.uk/cc/mcn4/spam/uolcc.cf (and includes one or two that I know are "bad", and need to be rewritten... use at your own risk!). Thanks! Matthew full UOLCC_HTM_L_URL_2 /\n(http:\/\/[a-z0-9-]+\.[a-z]{2,4}\/[[:alnum:]]{5,35}\/[[:alnum:]]{5,40}={0,3}\.htm)\s*\n\s*\n\s*([^\s]+)(\s+[^\s]+){1,}\s*\n\s*\n[^\s.]+(\s[^\s.]+){0,15}[^\.]\n\s*\n\1l/s describe UOLCC_HTM_L_URL_2 Matches pattern of spam mail (2) (.htm .html) score UOLCC_HTM_L_URL_2 3.8 full UOLCC_NOMORE /\n\s*no\s+more\?\s*\n/is describe UOLCC_NOMORE Bad unsubscribe question score UOLCC_NOMORE 0.1 full UOLCC_TOPGRADE /\n\s*top[-\s]*grade\s+quality\s*\n/is describe UOLCC_TOPGRADE Spammy phrase score UOLCC_TOPGRADE 0.1 full UOLCC_LOWPRICE /\n\s*low\s+prices?\s*\n/is describe UOLCC_LOWPRICE Spammy phrase score UOLCC_LOWPRICE 0.1 full UOLCC_FASTDELIV /\n\s*(?:swift|fast|quick)\s+delivery\s*\n/is describe UOLCC_FASTDELIV Spammy phrase score UOLCC_FASTDELIV 0.1 full UOLCC_RUSDELUXE /RusDeluxe.{0,5}Group/ describe UOLCC_RUSDELUXE Body contains spam phrase score UOLCC_RUSDELUXE 5.0 full UOLCC_RUSDELUXE1 /12 Pushkinskaya street, office/ describe UOLCC_RUSDELUXE1 Body contains spam address score UOLCC_RUSDELUXE1 5.0 full UOLCC_RUSDELUXE2 /33 Bolshaya Nikitskaya street, office/ describe UOLCC_RUSDELUXE2 Body contains spam address score UOLCC_RUSDELUXE2 5.0 full UOLCC_RD_ICQ /ICQ\#\s*338818190/i describe UOLCC_RD_ICQ Body contains bad ICQ number score UOLCC_RD_ICQ 5.0 full UOLCC_MAKE_MONEY /\.make\.money\./i describe UOLCC_MAKE_MONEY Body contains spam phrase score UOLCC_MAKE_MONEY 4.5 header UOLCC_ZETA_TRADE Subject =~ /Zeta Trade/ describe UOLCC_ZETA_TRADE Subject contains spam phrase score UOLCC_ZETA_TRADE 2.5 full UOLCC_ZETA_TRADE1 /Zeta Trade/ describe UOLCC_ZETA_TRADE1 Body contains spam phrase score UOLCC_ZETA_TRADE1 2.5 body __UOLCC_DRUG1 /cialis\s+soft\s+tabs/i body __UOLCC_DRUG2 /\bimpotence\b/i body __UOLCC_DRUG3 /\btadalafil\b/i body __UOLCC_DRUG4 /\bbest\s+erections?\b/i body __UOLCC_DRUG5 /\bno\s+prior\s+prescription\s+needed\b/i body __UOLCC_DRUG6 /\bless\s+sidebacks\b/i body __UOLCC_DRUG7 /\bsex\b/i meta UOLCC_DRUGS1 ((__UOLCC_DRUG1 + __UOLCC_DRUG2 + __UOLCC_DRUG3 + __UOLCC_DRUG4 + __UOLCC_DRUG5 + __UOLCC_DRUG6 + __UOLCC_DRUG7) > 4) describe UOLCC_DRUGS1 Refers to drugs score UOLCC_DRUGS1 3.5 -- Matthew Newton <[EMAIL PROTECTED]> UNIX and e-mail Systems Administrator, Network Support Section, Computer Centre, University of Leicester, Leicester LE1 7RH, United Kingdom