On Wed, 1 May 2002, Craig R. Hughes wrote: > Daniel Pittman wrote: > > DP> Break the rule up into individual tests for the different email > DP> packages and let it run. Aside from the better scoring for what is > DP> and isn't a real mail package, this will probably run faster in > DP> many cases as a simple string match, not a regexp, is used. > > Probably right.
Given that I know that, at least in part, it's hits on Communigate from my corpus that drag that rule away from trapping SPAM, that seems a good idea to me. :) > All the same logic applies to PORN_3 too, and I want to break that one > up, but of course PORN_3 is trickier because of the triple-repeat > part. PORN_3 is far and away the worst performing rule in the book. > Fully 10% of the execution time per message is being consumed by > testing PORN_3. Yup. That rule looks ... inefficient. Using an eval and a series of word tests should be better. Something akin to: my @porn_words = ("lolita", "cum", "org[iy]", "wild", "fuck", "teen", "action", "spunk", "pussy", "pussies", "suck", "sucking", "hot", "hottest", "voyeur", "le[sz]b(?:ian|o)", "anal", "interracial", "asian", "amateur", "sex+", "slut", "explicit", "(?:[^x]", ")xxx(?:[^x]", "live", "celebrity", "lick", "suck", "dorm", "webcam", "ass", "schoolgirl", "strip", "horny", "horniest", "erotic", "oral", "penis", "hardcore", "blow[ -]*job", "nast(?:y|iest)", "porn") sub porn_word_test { my ($self, $fulltext) = @_; my $hits = 0; foreach $word (@porn_words) { $hits++ if $$fulltext =~ /\b$word\b/i; return 1 if $hits == 3; } return 0; } If you got clever you could even have the set of words configurable somewhere in the test files; something like: my %word_set_tests = { 'PORN_WORDS' => ( ... ), ... }; WORDSET PORN_WORDS foo, bar, baz SCORE PORN_WORDS 3.0 Daniel -- Coding is easy: All you do is sit staring at a terminal until the drops of blood form on your forehead. -- Simon Cozens <[EMAIL PROTECTED]> _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk