Kingsley G. Morse Jr. writes: > Being an old AI/GA programmer who just started using > SA, your post fascinates me. Thanks for the update on > your research. > [...] > It seems to me that it would be interesting to consider a _summary_ of > > a.) The percentage of false positives and > negatives _before_ testing for date differences > > b.) The percentage of false positives and > negatives _after_ testing for date differences
Agreed. I think Craig might do something like this for each major version of spamassassin. Using my limited data set (6146 messages) and a very simple guess at scores (I used the original for 96 hours or more and divided by 2 for each period). I think most of the GA scores will end up much higher than these, but I had to start them somewhere and I usually err on the low side to avoid false positives. score DATE_IN_FUTURE_03_06 0.072 score DATE_IN_FUTURE_06_12 0.145 score DATE_IN_FUTURE_12_24 0.290 score DATE_IN_FUTURE_24_48 0.580 score DATE_IN_FUTURE_48_96 1.159 score DATE_IN_FUTURE_96_XX 2.318 score DATE_IN_PAST_03_06 0.072 score DATE_IN_PAST_06_12 0.145 score DATE_IN_PAST_12_24 0.290 score DATE_IN_PAST_24_48 0.580 score DATE_IN_PAST_48_96 1.159 score DATE_IN_PAST_96_XX 2.318 before: nonspam 4819 correct, 5 false positives spam 1170 correct, 152 false negatives after: nonspam 4819 correct, 5 false positives spam 1172 correct, 150 false negatives So, no additional false positives and 2 fewer false negatives. I think the GA will improve the improvement by quite a bit. Until then, I believe it's premature to do this sort of analysis, but since you asked for it... > c.) _How_many_ more rules would be added. 11 additional rules, but only the first invocation takes any significant amount of time, subsequent invocations are probably faster than most regular expression rules since it's just doing a numerical comparison based on a cached number. I think you're better off adding the rules, seeing how they work, and removing the slowest and worst performers later. Dan _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk