Bart Schaefer wrote: > On 4/29/06, Matt Kettler <[EMAIL PROTECTED]> wrote: >> Besides.. If you want to make a mathematics based argument against me, >> start by explaining how the perceptron mathematically is flawed. It >> assigned the original score based on real-world data. > > Did it? I thought the BAYES_* scores have been fixed values for a > while now, to force the perceptron to adapt the other scores to fit. > Actually, you're right..I'm shocked and floored, but you're right.
In SA 3.1.0 they did force-fix the scores of the bayes rules, particularly the high-end. The perceptron assigned BAYES_99 a score of 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50. That does make me wonder if: 1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules due to the ham corpus being polluted with spam. This forces the perceptron to attempt to compensate. (Pollution always is a problem since nobody is perfect, but it occurs to differing degrees). -or- 2) The perceptron is out-of whack. (I highly doubt this because the perceptron generated the ones for 3.0.x and they were fine) -or- 3) The Real-world FPs of BAYES_99 really do tend to also be cascades with other rules in the 3.1.x ruleset, and the perceptron is correctly capping the score. This could differ from 3.0.x due to change in rules, or change in ham patterns over time. -or- 4) one of the corpus submitters has a poorly trained bayes db. (possible, but I doubt it) Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight increase in ham-hits for BAYES_99 and a slight decrease in spam hits. 3.0.x: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 43.515 89.3888 0.0335 1.000 0.83 1.89 BAYES_99 3.1.x: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 60.712 86.7351 0.0396 1.000 0.90 3.50 BAYES_99 Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)