Excellent, I'll slap this in as an eval replacement for PORN_3 right now. I knew there was a reason I put up with this pain-in-the-ass mailing list :) For now I'll just do the hard-coded wordlist, someone can file a bugzilla ticket if they want words to be in a config file instead of the .pm
C Daniel Pittman wrote: DP> On Wed, 1 May 2002, Craig R. Hughes wrote: DP> > Daniel Pittman wrote: DP> > DP> > DP> Break the rule up into individual tests for the different email DP> > DP> packages and let it run. Aside from the better scoring for what is DP> > DP> and isn't a real mail package, this will probably run faster in DP> > DP> many cases as a simple string match, not a regexp, is used. DP> > DP> > Probably right. DP> DP> Given that I know that, at least in part, it's hits on Communigate from DP> my corpus that drag that rule away from trapping SPAM, that seems a good DP> idea to me. :) DP> DP> > All the same logic applies to PORN_3 too, and I want to break that one DP> > up, but of course PORN_3 is trickier because of the triple-repeat DP> > part. PORN_3 is far and away the worst performing rule in the book. DP> > Fully 10% of the execution time per message is being consumed by DP> > testing PORN_3. DP> DP> Yup. That rule looks ... inefficient. Using an eval and a series of word DP> tests should be better. Something akin to: DP> DP> my @porn_words = ("lolita", "cum", "org[iy]", "wild", "fuck", "teen", DP> "action", "spunk", "pussy", "pussies", "suck", "sucking", "hot", DP> "hottest", "voyeur", "le[sz]b(?:ian|o)", "anal", "interracial", "asian", DP> "amateur", "sex+", "slut", "explicit", "(?:[^x]", ")xxx(?:[^x]", "live", DP> "celebrity", "lick", "suck", "dorm", "webcam", "ass", "schoolgirl", DP> "strip", "horny", "horniest", "erotic", "oral", "penis", "hardcore", DP> "blow[ -]*job", "nast(?:y|iest)", "porn") DP> DP> sub porn_word_test { DP> my ($self, $fulltext) = @_; DP> my $hits = 0; DP> foreach $word (@porn_words) { DP> $hits++ if $$fulltext =~ /\b$word\b/i; DP> return 1 if $hits == 3; DP> } DP> return 0; DP> } DP> DP> If you got clever you could even have the set of words configurable DP> somewhere in the test files; something like: DP> DP> my %word_set_tests = { 'PORN_WORDS' => ( ... ), ... }; DP> DP> WORDSET PORN_WORDS foo, bar, baz DP> SCORE PORN_WORDS 3.0 DP> DP> Daniel DP> DP> _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk