On Wed, 1 May 2002, Craig R. Hughes wrote:
> Daniel Pittman wrote:
> 
> DP> Break the rule up into individual tests for the different email
> DP> packages and let it run. Aside from the better scoring for what is
> DP> and isn't a real mail package, this will probably run faster in
> DP> many cases as a simple string match, not a regexp, is used.
> 
> Probably right. 

Given that I know that, at least in part, it's hits on Communigate from
my corpus that drag that rule away from trapping SPAM, that seems a good
idea to me. :)

> All the same logic applies to PORN_3 too, and I want to break that one
> up, but of course PORN_3 is trickier because of the triple-repeat
> part. PORN_3 is far and away the worst performing rule in the book. 
> Fully 10% of the execution time per message is being consumed by
> testing PORN_3.

Yup. That rule looks ... inefficient. Using an eval and a series of word
tests should be better. Something akin to:

my @porn_words = ("lolita", "cum", "org[iy]", "wild", "fuck", "teen",
"action", "spunk", "pussy", "pussies", "suck", "sucking", "hot",
"hottest", "voyeur", "le[sz]b(?:ian|o)", "anal", "interracial", "asian",
"amateur", "sex+", "slut", "explicit", "(?:[^x]", ")xxx(?:[^x]", "live",
"celebrity", "lick", "suck", "dorm", "webcam", "ass", "schoolgirl",
"strip", "horny", "horniest", "erotic", "oral", "penis", "hardcore",
"blow[ -]*job", "nast(?:y|iest)", "porn")

sub porn_word_test {
    my ($self, $fulltext) = @_;
    my $hits = 0;
    foreach $word (@porn_words) {
        $hits++ if $$fulltext =~ /\b$word\b/i;
        return 1 if $hits == 3;
    }
    return 0;
}

If you got clever you could even have the set of words configurable
somewhere in the test files; something like:

my %word_set_tests = { 'PORN_WORDS' => ( ... ), ... };

WORDSET PORN_WORDS foo, bar, baz
SCORE PORN_WORDS 3.0

        Daniel

-- 
Coding is easy: All you do is sit staring at a terminal until the drops of
blood form on your forehead.
        -- Simon Cozens <[EMAIL PROTECTED]>

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to