I use bayes tests and do not use network tests with SA 2.55. Thing that puzzles me is the default scores for my situation.
Is there any reason that BAYES_80 score (5.3) is bigger then BAYES_90 score (4.027) and even BAYES_99 score (5.2)? BAYES_10 vs. BAYES_01 vs. BAYES_00 also look strange. As I understand it, bayes gets you a probability that an e-mail is spam. Why then .9 probability gets less weight than .8 probability? If somehow with the current bayes implementation the .9 and bigger score is more doubtful then .8 then why the same bayes scores when using network tests steadily going up? Can anybody shed a light on it?
You're over-simplifying the system.. The scores would likely be linear if the BAYES rules were the only rules in the entire ruleset.
However, that's not the case, there's hundreds of other rules in the ruleset. The scores assigned to rules are not just a function of the rule and how much spam it matches. They are really a function of the rule AND what combinations other rules also match the same messages. This is the beauty of what the GA does.. it analyzes a very complex set of patterns and assigns scores which are a "best fit" to real-world data.
Emails which score very high in bayes are also likely to be emails that are super-obvious to the default ruleset and will score high without a high score assigned to the bayes_90. However emails coming in at 80 are more likely to be "sneaky" mails that don't match as many rules in the default rulset, so the extra score might be necessary.
Really, you'd have to rsync out the mass-check data and spend about a week analyzing it all by hand to figure out the exact reasons why the GA laid the score that way.
I know I'm too lazy to do all that work by hand, but suffice to say, it's not reasonable to expect simple linear score assignments from an inherently complex system of hundreds of inter-related rules which gets real-world data as learning input to a "best fit" genetic algorithm score assignment. These things which look "wrong" to the simplified view quickly turn out to be "right" in most cases when you start looking at the bigger picture.
It's not entirely wrong to question the score assignments of the GA, but you certainly need to do so from the perspective of SA as a whole system, not just an individual rule or subset of rules.
If you dig back in the archives, this exact same question has been asked many times about SPAM_PHRASES in older versions. (spam_phrases was really a lot like a super-simplified bayes that had a fixed token database)
------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk