Re: A different approach to scoring spamassassin hits

Marc Perkel Sat, 30 Jun 2007 07:46:00 -0700


Loren Wilton wrote:

You have a bit of a chicken and egg problem at the start.  Until
some learning takes place in the system.
Two possibilities. The rules exist and have scores. Assume they aremaintained, for whatever reason.
1. Until Bayes has enough info to kick in, classification is doneby the scores. Then when Bayes kicks in the scores turn off (insofaras adding to themessage score, they might still show up as tokens inthe message that Bayes will process).
2. Divide all the scores by 10 or 20. The leave them on. Prettysoon bayes will override almost any reasonable score combination.
BTW, while ham rules are possible, SA has almost no ham rules; perhapstwo or so. Spammers long ago found they could write their spams tomatch ham rules and thus bypass SA. Thus, no ham rules, no spmammerworkarounds. Of course personal or ste specific ham rules willgenerally still work, since they will not be public knowledge andspammers won't be able to target them.
I suspect you can find all rule names in PerMsgStatus. However thelatest SA versions have implemented a 'check' plugin that actuallyruns the rules and accumulates the score. The rule running was movedto a plugin so that people could, at least in theory, change the orderor the way that rules are run. It sounds like that is what you wantto do, so a modified Check plugin may well be the way to go.
I don't understand though why you are interested in the names of allrules run; I don't see what it buys you. Currently ALL rules are run,unless short-circuiting is in effect, and by default it mostly isn't.In any case, if a rule doesn't hit on a message, the name of the ruleis probably irrelevent. It might have missed because the message isham, but it even more likely missed because it simply targets adifferent kind of spam. So assuming that "rules not hit" === "goodtokens" is unlikely to be the case.
You should be able to get Bayes to scan the rule names hit prettyeasily. Bayes is just about the last rule; I think Awl comes afterit. You might want to change that order, which I suspect you can doin the Check plugin. You could then modifty the Check code to push therule names into a special header line before calling Bayes. Thiscould probably be done in Check, and could certainly be done by aone-off plugin that you wrote. It would be called by a special rulejust before Bayes is called, and again, it would add the current rulenames to a special header bayes could see.
Of course you have to modify Check to drop out the scores for thenon-byes rules. Either that or rescore all of the rules.

Just a thought - what if we had some central servers for real timereporting where the SA rule hits and scores were reported in real timefor some sort of live scoring or analysis or dynamic adjusting? Justthinking out loud here.

Re: A different approach to scoring spamassassin hits

Reply via email to