Ok, it's done. That was the last thing on the list to get done before a 2.1 release, so now I think I'll go ahead and release in a day or two (after people have a chance to notice that the new stuff is broken).
Here is a description of how the changed AWL stuff works: One major thing: the AWL is no longer part of the header tests, as it used to be. It's now its own separate checking stage, which happens after all other tests have run. Rather than adding a constant amount (constant -ve amount that is) to the score, AWL will now instead shift the score based on the long term average score observed for a particular sender, so it needs the total message score to be calculated before it can operate. If you picture the range of possible scores for a message as a line: - ---------------------|----------------------- + non-spam messages 5 spam messages The idea is that a particular sender can also be scored, over their lifetime. There are generally-spammy senders, and generally-nonspammy senders. If we track the total score of all messages for each sender, and the number of messages observed, we can calculate the mean score for a particular sender, and place the sender on the line somewhere. Then, when we receive a new message from that sender, we calculate a score as normal for the message, following all the rules. We come up with a score somewhere on the line. Now, instead of using that score as the final score, we "move" toward the sender's average score along the line. The distance we move we'll call the shrinkage factor (settable in the cf files as auto_whitelist_factor). By default, shrinkage will be 0.5, so if we have: ----------|-----------|-----|----------- mean 5 pre-score Then we'll move the score "half way" toward the mean, and we'll end up as: ------|-----------|----|----------------- mean post-score 5 And so the message will be identified as "non-spam" even though the rules consider it spam. We'll then update the user's mean with the score for the new message (currently using the post-score, which might be wrong -- I'll have to think about that) for next time. This system has a number of advantages over the simple counting method of the old AWL implementation: 1. Spammers before could just send you 3 "clean" messages and thereby get themselves permanently obtaining a -100 bonus. Now they would have to keep restocking their spamming addresses by sending dummy messages to keep their mean low. And if their mean were, say, 1 long-term, any message they sent scoring >=9 would count as spam anyway. 2. Spammers could use a "well-known good" address which they reasonably guess to be whitelisted (think from: [EMAIL PROTECTED]) and get the -100 bonus. Now, they can of course still use some well-known good address, but the bonus obtained will be far lower. 3. The AWL not operates automatically as an auto-blacklist too! If you generally receive spammy mails from a particular address, then the scores will be pulled toward the spammy mean, *raising* their score if the spammer happens to send you a less-spammy message. This is all now checked into CVS, including changes to the rules and scores files to make the changes effective. The contents of CVS are now basically a release candidate for 2.1 -- I'm not going to add any more features to the tree until after 2.1 release, and I'm only going to make show-stopper bugfixes. Please get the latest stuff from CVS (or wait till after ~1am PST and get the 2.1 tarball from the website) and try it out over the next few days. I've re-instated the "-a" flag in the spamd startup scripts, but make sure you're using it, and let me know how it's working. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk