Matthew Newton wrote:
On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
They use html and tables very smart, thus avoiding Bayes rules.
Basically it is an invisible tables, using one row and several
columns. The first column contains the first letter of every line,
separated by "<BR>" and optionally some style-tags (b, i, etc.).
Next column contains several more characters for each line, etc.

Leo.  There are a good 9 or 10 variations on this now.  The SARE
rulesets have a number of rules that catch many of these, though not
all of them.

On the assumption that "normal" URLs don't use the construct /? in
them, and especially at geocities (are CGI scripts even allowed
there?) how about the following?

full      UOLCC_UKGEO
/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
describe  UOLCC_UKGEO UK Geocities exploitation
score     UOLCC_UKGEO 4.0

I've been testing this for a couple of weeks now, and have had no
complaints yet (but I do not have a corpus of spam to test it
with, though, so can't be too sure).

It could possibly also be condensed to the following (completely
untested):

full      UOLCC_UKGEO
/http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/

I saw somebody else use
uri  UK_GEOCITIES   m'^http://uk\.geocities\.com\b'i
describe UK_GEOCITIES Body contains spammed domain
score   UK_GEOCITIES 3.0
uri  MSN_SPACES  m'^http://spaces\.msn\.com\/members\b'i
describe MSN_SPACES Body contains spammed domain
score   MSN_SPACES 3.0
uri  IT_GEOCITIES   m'^http://it\.geocities\.com\b'i
describe IT_GEOCITIES Body contains spammed domain
score   IT_GEOCITIES 3.0

PLEASE NOTE: I haven't used it myself so I don't know the FP count of these rules

With kind regards,
Met vriendelijke groet,

Maurice Lucas
TAOS-IT

Reply via email to