On Wed, 7 Jan 2004 11:03:35 -0500, Chris Santerre <[EMAIL PROTECTED]> writes:

> > Are you having trouble doing the conversion automatically? 
> 
> Yup ;)
> 
> > I can
> > describe the algorithm to transform the regexps and to find
> > maximum-size prefixes if you (or someone else) wants to
> > implement. I've tried, but my perl knowledge for the datastructure
> > voodoo is a bit lacking, but the correct algorithm will give a new
> > ruleset that will have *identical* results to doing the matches
> > sequentially. The program for the conversion should be about 
> > 30-50 lines.
> > 
> *snip*
> 
> Basically bigevil has gone completly manual now. Scripts automating
> it were essential to the project. Now they become more of a
> hinderence. I have some plans for some new scipts to get domain
> names, but adding anything to the actual cf file has to be done by
> hand.

Ouch! :( 

IMO, 'bigevil.cf' should be fully automatically generated from
'bigevil.domains'.  Why did you quit using scripts?

> Same example: domain.net and domain.com is a spammer. But domain.org is not.
> I can't just say /domain\.(?:com|net|org)/ because of the FP.

Only insert specific domains and don't automatically add additional
rules for the other toplevels? Perhaps have a 'maybeevil.domains' with
a low score that gets these rules added instead?

>  Also scripts
> don't see things like:
> 
> spam2003.com, spam2004.com,..... could be rewritten as /spam200\d\.com/
> 
> Or that some of the IP addresses can be broken down to subnets. 

These can't be handled by the exact algorithm I gave, but you can do
prefix-analysis at least up to the first non-literal
character. Annotate each node in the trie with a list of such
exceptional patterns. In the decomposition algorithm #2, this list
gets concatenated in, I believe it is in step 5.

> I see what you mean by the tree structure of the rules. Eventually I
> hope to get there. I plan to pull out .org,us,info tld's into their
> own rules. So I'm changing a few at a time. But at this point,
> automating any changes isn't going to work :(

I have a PCRE analyzer that, if I had time, I could adapt to convert
from bigevil.cf to bigevil.domains. I have almost no time, but I can
try too do it if it would it be useful? You could write a simple perl
script that converted from bigevil.domains to bigevil.cf by
concatenating regexps together 30 at a time. Also, I gave some
techniques to manage bigevil.domains-type files a few weeks ago.

Scott


-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to