How about using split and then counting chunks? ie something like: @chunks = split $patterns,$$fulltext;
$score = some_function_of(scalar @chunks); For the proximity thing, you can check the length() of the various elements of @chunks C Matthew Cline wrote: MC> On Thursday 02 May 2002 02:14 am, Michael Moncur wrote: MC> MC> > Actually it seems harmless - unlike the old spam phrases stuff, there's MC> > still only one rule and PORN_3 has a score of 0.6, so it's not going to MC> > push too many things over the threshold. MC> > MC> > Perhaps after testing it might be good to have a separate LOTS_OF_PORN_3 MC> > rule that checks for a higher number... MC> MC> Another idea: make a single regexp to search for any occurence of the words in MC> the list. Then do a loop over $$fultext =~ m/PATTERN/g, and call pos() each MC> iteration to get the match position, so as to make an array of positions of MC> where the matches occured. Then you can determine if the words occur close MC> to one another, like with the old PORN_3 rule, but much, much quicker than MC> the complicated regexp for the old PORN_3. MC> MC> _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk