On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote: > > They use html and tables very smart, thus avoiding Bayes rules. > > Basically it is an invisible tables, using one row and several columns. > > The first column contains the first letter of every line, separated by > > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains > > several more characters for each line, etc. > > Leo. There are a good 9 or 10 variations on this now. The SARE rulesets > have a number of rules that catch many of these, though not all of them.
On the assumption that "normal" URLs don't use the construct /? in them, and especially at geocities (are CGI scripts even allowed there?) how about the following? full UOLCC_UKGEO /http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/ describe UOLCC_UKGEO UK Geocities exploitation score UOLCC_UKGEO 4.0 I've been testing this for a couple of weeks now, and have had no complaints yet (but I do not have a corpus of spam to test it with, though, so can't be too sure). It could possibly also be condensed to the following (completely untested): full UOLCC_UKGEO /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/ Matthew -- Matthew Newton <[EMAIL PROTECTED]> UNIX and e-mail Systems Administrator, Network Support Section, Computer Centre, University of Leicester, Leicester LE1 7RH, United Kingdom