On Wed, 7 Jan 2004 11:03:35 -0500, Chris Santerre <[EMAIL PROTECTED]> writes:
> > Are you having trouble doing the conversion automatically? > > Yup ;) > > > I can > > describe the algorithm to transform the regexps and to find > > maximum-size prefixes if you (or someone else) wants to > > implement. I've tried, but my perl knowledge for the datastructure > > voodoo is a bit lacking, but the correct algorithm will give a new > > ruleset that will have *identical* results to doing the matches > > sequentially. The program for the conversion should be about > > 30-50 lines. > > > *snip* > > Basically bigevil has gone completly manual now. Scripts automating > it were essential to the project. Now they become more of a > hinderence. I have some plans for some new scipts to get domain > names, but adding anything to the actual cf file has to be done by > hand. Ouch! :( IMO, 'bigevil.cf' should be fully automatically generated from 'bigevil.domains'. Why did you quit using scripts? > Same example: domain.net and domain.com is a spammer. But domain.org is not. > I can't just say /domain\.(?:com|net|org)/ because of the FP. Only insert specific domains and don't automatically add additional rules for the other toplevels? Perhaps have a 'maybeevil.domains' with a low score that gets these rules added instead? > Also scripts > don't see things like: > > spam2003.com, spam2004.com,..... could be rewritten as /spam200\d\.com/ > > Or that some of the IP addresses can be broken down to subnets. These can't be handled by the exact algorithm I gave, but you can do prefix-analysis at least up to the first non-literal character. Annotate each node in the trie with a list of such exceptional patterns. In the decomposition algorithm #2, this list gets concatenated in, I believe it is in step 5. > I see what you mean by the tree structure of the rules. Eventually I > hope to get there. I plan to pull out .org,us,info tld's into their > own rules. So I'm changing a few at a time. But at this point, > automating any changes isn't going to work :( I have a PCRE analyzer that, if I had time, I could adapt to convert from bigevil.cf to bigevil.domains. I have almost no time, but I can try too do it if it would it be useful? You could write a simple perl script that converted from bigevil.domains to bigevil.cf by concatenating regexps together 30 at a time. Also, I gave some techniques to manage bigevil.domains-type files a few weeks ago. Scott ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk