Duncan Findlay wrote:

DF> Clearly, we can not do this with EVERY combination, unless Craig has a
DF> lot of CPU to spare. There are just under 400 rules right now. If we
DF> ended up with 400 tests, there would be 79800 doubles and 10586800
DF> triplets.

We really don't care about *EVERY* combination, just the common ones.  I bet
many of the combinations never appear together, and it might even be impossible
for some combinations to appear together (haven't checked).

DF> So, assuming the GA runs in O(n) time, (which is not at all likely to
DF> be true -- I'd guess O(n^2) if I had to), this would require 26668
DF> times longer to generate scores.



DF> Of course this total would be less but still quite significant if
DF> doubles and triples were added as they were seen, but still, I
DF> estimate this would be extremely taxing on CPU.

Shouldn't be too bad, I don't think.  The GA is pretty efficient when evaluating
genomes.  Wouldn't take very long to turn spam.log and nonspam.log into a
rule-combination-frequencies file either.

C


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to