How about using split and then counting chunks?  ie something like:

@chunks = split $patterns,$$fulltext;

$score = some_function_of(scalar @chunks);

For the proximity thing, you can check the length() of the various elements of
@chunks

C

Matthew Cline wrote:

MC> On Thursday 02 May 2002 02:14 am, Michael Moncur wrote:
MC>
MC> > Actually it seems harmless - unlike the old spam phrases stuff, there's
MC> > still only one rule and PORN_3 has a score of 0.6, so it's not going to
MC> > push too many things over the threshold.
MC> >
MC> > Perhaps after testing it might be good to have a separate LOTS_OF_PORN_3
MC> > rule that checks for a higher number...
MC>
MC> Another idea: make a single regexp to search for any occurence of the words in
MC> the list.  Then do a loop over $$fultext =~ m/PATTERN/g, and call pos() each
MC> iteration to get the match position, so as to make an array of positions of
MC> where the matches occured.  Then you can determine if the words occur close
MC> to one another, like with the old PORN_3 rule, but much, much quicker than
MC> the complicated regexp for the old PORN_3.
MC>
MC>


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to