On Tue, 2003-08-19 at 05:16, Daniel Quinlan wrote: > Brian White <[EMAIL PROTECTED]> writes: > > > Wouldn't the Bayes tests be just the thing for these since it's already > > adaptive? > > Yes, but there's a difference between one good token and a surefire rule > that catches a significant amount of spam. > > > What I can see happening, though, is spammers start using a "salt" so > > that the entire string is effectively random. > > I think salts are better used in one-way hashes, not two-way obfuscation > techniques systems, which is what this really is, but spammers are > definitely likely to continue shifting to more complicated techniques. > > Here's a new revision of my eval test. I merged some code I had lying > around to look for any rotation and also added initial tests for the > name/citation and reverse ciphers mentioned on Yorkshire Dave's page. > > The rules still need some work and tuning. > > I unfortunately get some FPs for rotxx right now and reverse has > horrible results because reversing English produces a lot of natural > words.
Reverse really needs something for context, I'm beginning to realise that now, you're not alone in the FPs on reverse. I've been studying the rotations over the last few days studying the FPs There are a LOT of potential FPs, the number of FPs vary at different rotations and wordlengths too, and I think that can be used. I'm in the middle of trying to figure out which lengths are safe for which rotations, I've permuted the dictionary and a couple of common names files 25 ways around, matched every permutation of every word back against the entire dictionary, created files of the permuted words which match, and I'm currently trying to derive some usable figures from the results. There are some amusing and interesting potential FPs, here's a few that jumped out at me when I was glancing at the matchfiles :) at rot-3, cobra > freud at rot-9, sleep > bunny, river > arena at rot-15, dazed > spots at rot-16, comedy > secuto = prosecutor, excel > unsub at rot-19, jolly > cheer at rot-20, curly > wolfs, cushy > wombs, layout > fusion, robot > livin = living You really don't want to be matching on anything less than 5 characters, the rule should really fail for short usernames or something. rot-4 has 2392 fp'able words, 90% of which are at 4 letters or less, whilst rot-5 has only 911 in total. I'm still crunching figures, should have the number of fp combos in another day or so. I should really have used faster hardware, it's taking forever. --snipped excellent code-- -- Yorkshire Dave -- Scanned by MailScanner at wot.no-ip.com ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk