Hi, On Thu, 20 Nov 2003, Smart,Dan wrote: > Is there a reason that the Bayes scoring is NOT a normal distribution from > 50% to 100%, and negative from 0% to 50%?
Yes, check the [SAtalk] list archives; this may well be a FAQ. Short answer: all scores including those from Bayes are generated by a genetic algorithm ("the GA") which cares little for making the scores fit an ideal curve (normal distribution) or satisfy the consistency hobgoblins. The GA adjusts scores until FNs and FPs are minimized within the time and accuracy constraints it's given. People occasionally speculate on why the GA scores the way it does; ultimately it doesn't really matter since SA works best on the test corpus with the scores set the way they are; changing them makes SA perform worse. Besides, if SA was more effective with a nice clean normal distribution of scores for Bayes, don't you think Jason, et al. would ship it that way? :) If you are going to adjust the scores yourself, your best bet is to run the GA against your own (large) corpus of ham and spam so the scores are tuned to the mail your site sees. Adjusting them by hand is almost guaranteed to make SA perform worse. Which is not to discourage you; on the contrary, I think that SA is more effective globally if sites generate their own scores with the GA, if only because spammers can't be sure what score sets are in use. Any attempts to weasel around the generic scores of SA will probably get flagged by someone else's local tuning. hth, -- Bob ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk