On Tue, 2003-08-19 at 05:16, Daniel Quinlan wrote:
> Brian White <[EMAIL PROTECTED]> writes:
> 
> > Wouldn't the Bayes tests be just the thing for these since it's already
> > adaptive?
> 
> Yes, but there's a difference between one good token and a surefire rule
> that catches a significant amount of spam.
>  
> > What I can see happening, though, is spammers start using a "salt" so
> > that the entire string is effectively random.
> 
> I think salts are better used in one-way hashes, not two-way obfuscation
> techniques systems, which is what this really is, but spammers are
> definitely likely to continue shifting to more complicated techniques.
> 
> Here's a new revision of my eval test.  I merged some code I had lying
> around to look for any rotation and also added initial tests for the
> name/citation and reverse ciphers mentioned on Yorkshire Dave's page.
> 
> The rules still need some work and tuning.
> 
> I unfortunately get some FPs for rotxx right now and reverse has
> horrible results because reversing English produces a lot of natural
> words.

Reverse really needs something for context, I'm beginning to realise
that now, you're not alone in the FPs on reverse.

I've been studying the rotations over the last few days studying the FPs

There are a LOT of potential FPs, the number of FPs vary at different
rotations and wordlengths too, and I think that can be used.

I'm in the middle of trying to figure out which lengths are safe for
which rotations, I've permuted the dictionary and a couple of common
names files 25 ways around, matched every permutation of every word back
against the entire dictionary, created files of the permuted words which
match, and I'm currently trying to derive some usable figures from the
results.

There are some amusing and interesting potential FPs, here's a few that
jumped out at me when I was glancing at the matchfiles :)

at rot-3, cobra > freud
at rot-9, sleep > bunny, river > arena
at rot-15, dazed > spots
at rot-16, comedy > secuto = prosecutor, excel > unsub
at rot-19, jolly > cheer
at rot-20, curly > wolfs, cushy > wombs, layout > fusion, robot > livin
= living

You really don't want to be matching on anything less than 5 characters,
the rule should really fail for short usernames or something. rot-4 has
2392 fp'able words, 90% of which are at 4 letters or less, whilst rot-5
has only 911 in total. I'm still crunching figures, should have the
number of fp combos in another day or so.

I should really have used faster hardware, it's taking forever.

--snipped excellent code--

-- 
Yorkshire Dave


-- 
Scanned by MailScanner at wot.no-ip.com



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to