Re: [SAtalk] large numbers of tiny scores = SPAM!

Duncan Findlay Thu, 30 May 2002 16:31:15 -0700

On Thu, May 30, 2002 at 10:01:27AM +0100, Matt Sergeant wrote:
> Kingsley G. Morse Jr. wrote:
> >Good point. Combinations of some rules may be more
> >indicative of spam than others.
> >
> >It would be great if the GA could infer the boolean
> >logic, as well as the scores.
> 
> It's possible that you could group the rules that matched, and feed it 
> into the score generating system (whatever that may be - I'm looking to 
> get rid of using the GA's here as it's just too slow to work with).
> 
> You'd have to do some spanning though. For example, if an email matches 
> rules A B C D and E, and you decided you wanted to try scoring against 
> triplets, you'd need to feed the score generator:
> 
> ABC
> ACD
> ADE
> ABD
> ABE
> ACE
> BCD
> BDE
> BCE
> CDE
> 
> (I may have missed some combinations above, but you get the idea).
> 
> So yes, I think it can be done (and pretty easily with my new 
> system[1]), but it's a fair bit of work.
> 
> Matt.
> 
> [1] Unfortunately it's not something I can give away - not yet. Maybe 
> towards the end of Q3 after we've got all this running live.


Clearly, we can not do this with EVERY combination, unless Craig has a
lot of CPU to spare. There are just under 400 rules right now. If we
ended up with 400 tests, there would be 79800 doubles and 10586800
triplets.

So, assuming the GA runs in O(n) time, (which is not at all likely to
be true -- I'd guess O(n^2) if I had to), this would require 26668
times longer to generate scores.

Of course this total would be less but still quite significant if
doubles and triples were added as they were seen, but still, I
estimate this would be extremely taxing on CPU.

-- 
Duncan Findlay

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] large numbers of tiny scores = SPAM!

Reply via email to