On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
> > They use html and tables very smart, thus avoiding Bayes rules.
> > Basically it is an invisible tables, using one row and several columns.
> > The first column contains the first letter of every line, separated by
> > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> > several more characters for each line, etc.
> 
> Leo.  There are a good 9 or 10 variations on this now.  The SARE rulesets
> have a number of rules that catch many of these, though not all of them.

On the assumption that "normal" URLs don't use the construct /? in
them, and especially at geocities (are CGI scripts even allowed
there?) how about the following?

full      UOLCC_UKGEO 
/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
describe  UOLCC_UKGEO UK Geocities exploitation
score     UOLCC_UKGEO 4.0

I've been testing this for a couple of weeks now, and have had no
complaints yet (but I do not have a corpus of spam to test it
with, though, so can't be too sure).

It could possibly also be condensed to the following (completely
untested):

full      UOLCC_UKGEO 
/http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/

Matthew


-- 
Matthew Newton <[EMAIL PROTECTED]>

UNIX and e-mail Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom

Reply via email to